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In connection with a Sequence Listing submitted concurrently 
herewith, the undersigned hereby states that: 

1. the submission, filed herewith in accordance with 37 
C.F.R. § 1.821(g), does not include new matter; 

2 . the content of the attached paper copy and the 
attached computer readable copy of the Sequence Listing, submitted in 
accordance with 37 C.F.R. § 1.821(c) and (e) , respectively, are the same; 
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3 . all statements made herein of their own knowledge are 
true and that all statements made on information and belief are believed to 
be true; and further, that these statements were made with the knowledge 
that willful false statements and the like so made are punishable by fine 
or imprisonment, or both, under Section 1001 of Title 18 of the United 
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States Code and that such willful false statements may jeopardize the 
validity of the application or any patent resulting therefrom. 



HARBOR CONSULTING 

Intellectual Property Services 
1500A Lafayette Road 
Suite 262 
Portsmouth, N.H. 
800-318-3021 



Respectfully submitted, 
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IN THE UNITED STATES DESIGNATED/ELECTED OFFICE 

International Application No. : PCT/GBOO/01015 

International Filing Date : 17 MARCH 2000 

Priority Date(s) Claimed : 23 MARCH 1999 

Applicant(s) (DO/EO/US) : CARR, Francis, J. et al 

Title: PROTEIN ISOLATION AND ANALYSIS 

PRELIMINARY AMENDMENT 

Commissioner for Patents 
Washington, D.C. 20231 

SIR: 

Prior to calculating the national fee, and prior to examination in the National Phase of 
the above-identified International application, please amend as follows: 

IN THE CLAIMS : 

5. (Amended) A method as claimed in claim 3 wherein the library of proteins is 
brought into contact/association with one or more target moieties, eg target proteins. 

7. (Amended) A method as claimed in claim 5 wherein after binding, the 
complexes of protein/target moiety are isolated, followed by digestion with endoprotease to 
release the "barcode" sequence or sequences. 

10. (Amended) A method as claimed in claim 1 wherein the library of proteins is 
a library of antibodies. 

14. (Amended) A method as claimed in claim 10 wherein the "barcode" sequence 
is C-terminal to the Fv sequence. 
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15. (Amended) A library of proteins as defined in claim 1 . 

20. (Amended) A method as claimed in claim 17 wherein the target is a complex 
mixture, eg a mixture of molecules, whole cells or cell membranes. 

23. (Amended) A method as claimed in claim 21 wherein after screening for 
binding to the target the library is dereplicated to identify one or more proteins with a 
desirable property, proteins which bind to the target. 

24. (Amended) A method as claimed in claim 21 where the "associating moiety" 
is a particle. 

26. (Amended) A method as claimed in claim 21 wherein the "associating 
moiety" is a protein or protein complex. 

28. (Amended) A method as claimed in claim 21 wherein the "associating 
moiety" is a bispecific binding molecule capable of binding to both the proteins and genes. 

29. (Amended) A method as claimed in claim 21 wherein the "associating 
moiety" is a living cell or cellular virus such as a bacteria or bacteriophage. 

30. (Amended) A method as claimed in claim 21 wherein one or other molecules 
which alter the properties of the proteins in the library are bound to the "associating moiety". 

3 1 . (Amended) A method as claimed in claim 2 1 wherein the genes encoding the 
proteins in the library are are attached to the "associating moiety" prior to synthesis of the 
individual proteins. 

32. (Amended) A method as claimed in claim 21 wherein the library of proteins is 
a library of antibody proteins, eg a library of antibody domains such as Fvs. 
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40. (Amended) A method as claimed in claim 37 wherein the library of protein 
binding agents is a library of antibodies or antibody fragments. 

41 . (Amended) A method as claimed in claim 37 wherein the protein binding 
agents are major histocompatibility proteins, T cell receptors and natural proteins or protein 
domains involved in protein-protein binding interactions, such as SHI domains. 

42. (Amended) A method as claimed in claim 40 wherein the library of protein 
binding agents is pre-selected for binding to one or more proteins or peptides derived from 
the protein mixture or a related protein mixture under analysis. 

44. (Amended) A method as claimed in claim 36 wherein the protein mixture is 
initially bound to a solid phase prior to digestion or cleavage either via the N or C-terminus or 
via specific amino acids or via specific sequences of amino acids. 

45. (Amended) A method as claimed in claim 36 wherein specific amino acids or 
modified amino acids found in the proteins are derivatised prior to binding to a solid phase, 
such binding occurring either before or after digestion or cleavage of the protein mixtures. 

48. (Amended) A method as claimed in claim 36 wherein specific naturally 
modified amino acids found in the proteins are bound to a solid phase using modification 
specific affinity reagents, such binding occurring either before or after digestion or cleavage 
of the protein mixtures. 

49. (Amended) A method as claimed in claim 45 wherein more than one cycle of 
digestion/cleavage and derivatisation is carried out. 

5 1 . (Amended) A method as claimed in claim 36 wherein pepfides released after 
digestionycleavage are fractionated using physical methods such as HPLC before or after 
fractionation using protein binding agents. 
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REMARKS 



The purpose of this Prehminary Amendment is to eHminate muUiple dependent 
claims in order to avoid the additional fee. Applicants reserve the right to reintroduce claims 
to canceled combined subject matter. 

Attached hereto is a marked-up version of the changes made to the claims by the 
current amendment. The attached pages are captioned "Version With Markings to Show 
Changes Made". 

Respectfully submitted. 




Anthony J. ZelanofReg. No. 27,969 
Attorney for Applicants 

MILLEN, WHITE, ZELANO & BRANIGAN, P.C.' 
Arlington Courthouse Plaza 1 
2200 Clarendon Boulevard, Suite 1400 
Arlington, VA 22201 
Direct Dial: 703-812-5311 
Facsimile: 703-243-6410 
AJZ:jmm Email: zelano@mwzb.com 
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VERSION WITH MARKINGS TO SHOW CHANGES MADE 

Claims 5, 1, 10, 14-15, 20, 23-24, 26, 28-32, 40-42, 44-45, 48-49 and 51 have been amended 
as follows: 

5. (Amended) A method as claimed in claim 3 or claim 4 wherein the library of 
proteins is brought into contact/association with one or more target moieties, eg target 
proteins. 

7. (Amended) A method as claimed in claim 5 or claim 6 wherein after binding, 
the complexes of protein/target moiety are isolated, followed by digestion with endoprotease 
to release the "barcode" sequence or sequences. 

10. (Amended) A method as claimed in any one of claims 1 to 9 wherein the 
library of proteins is a library of antibodies. 

14. (Amended) A method as claimed in any one of claims 10 to 13 wherein the 
"barcode" sequence is C-terminal to the Fv sequence. 

15. (Amended) A library of proteins as defined in any on e of claims 1 to 14 . 

20. (Amended) A method as claimed in claim 17 or claim 18 wherein the target is 
a complex mixture, eg a mixture of molecules, whole cells or cell membranes. 

23. (Amended) A method as claimed in claim 21 or claim 22 wherein after 
screening for binding to the target the library is dereplicated to identify one or more proteins 
with a desirable property, proteins which bind to the target. 

24. (Amended) A method as claimed in any on e of claims 21 to 23 where the 
"associating moiety" is a particle. 
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26. (Amended) A method as claimed in t 
"associating moiety" is a protein or protein complex. 



-claims 21 to 23 wherein the 



28. (Amended) A method as claimed in claim 21 o r claim 22 wherein the 
"associating moiety" is a bispecific binding molecule capable of binding to both the proteins 
and genes. 

29. (Amended) A method as claimed in any one of claims 21 to 23 wherein the 
"associating moiety" is a living cell or cellular virus such as a bacteria or bacteriophage. 

30. (Amended) A method as claimed in any one of claims 2 1 to 29 wherein one or 
other molecules which alter the properties of the proteins in the library are bound to the 
"associating moiety". 

3 1 . (Amended) A method as claimed in any on e of claims 2 1 to 30 wherein the 
genes encoding the proteins in the library are are attached to the "associating moiety" prior to 
synthesis of the individual proteins. 

32. (Amended) A method as claimed in any on e of claims 21 to-^wherein the 
library of proteins is a library of antibody proteins, eg a library of antibody domains such as 
Fvs. 

40. (Amended) A method as claimed in any on e of claims 37 to 39 wherein the 
library of protein binding agents is a library of antibodies or antibody fragments. 

41 . (Amended) A method as claimed in any on e of claims 37 to 39 wherein the 
protein binding agents are major histocompatibility proteins, T cell receptors and natural 
proteins or protein domains involved in protein-protein binding interactions, such as SHI 
domains. 
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42. (Amended) A method as claimed in claim 40 ui claim 41 wherein the library 
of protein binding agents is pre-selected for binding to one or more proteins or peptides 
derived from the protein mixture or a related protein mixture under analysis. 

44- (Amended) A method as claimed in any one of claims 36-to-43 wherein the 
protein mixture is initially bound to a solid phase prior to digestion or cleavage either via the 
N or C-terminus or via specific amino acids or via specific sequences of amino acids. 

45 . (Amended) A method as claimed in di i y one of claims 3 wherein 
specific amino acids or modified amino acids found in the proteins are derivatised prior to 
binding to a solid phase, such binding occurring either before or after digestion or cleavage of 
the protein mixtures. 

48. (Amended) A method as claimed in any one of claims 36-to-43 wherein 
specific naturally modified amino acids found in the proteins are bound to a solid phase using 
modification specific affinity reagents, such binding occurring either before or after digestion 
or cleavage of the protein mixtures. 

49. (Amended) A method as claimed in anyone of claims 45 to 48 wherein more 
than one cycle of digestion/cleavage and derivatisation is carried out. 

5 1 . (Amended) A method as claimed in diiy one of claims 36 to 50 wherein 
peptides released after digestion/cleavage are fractionated using physical methods such as 
HPLC before or after fractionation using protein binding agents. 
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PROTEIN ISOLATION AND ANALYSIS 

The present invention relates to the isolation and analysis of proteins especially by 
mass analysis. The invention has particular application to the isolation of binding 
5 proteins such as antibodies. The invention also provides for modification of proteins 
or protein fragments in order to facilitate mass analysis and or the isolation of specific 
proteins encoded by members of a gene library. 

For the isolation of proteins, the invention provides new methods for isolating specific 
10 proteins from a complex mixture of such proteins by virtue of binding to a specific 
target. In particular, the mvention provides methods for isolating specific antibody 
domains from a gene library-derived mixture of such domains by virtue of binding to a 
specific target antigen. For the analysis of proteins, the invention provides new 
methods for analysing complex mixtures of proteins especially to compare proteins 
15 between two or more different samples. 

For the isolation of proteins from complex mixtures by virtue of binding to a specific 
target and where the identity or amino acid sequence of the protein is unknown 
beforehand, it has usually been very difficult to isolate enough protein which binds to 

20 the target for direct characterisation of the protein. In order to select a protein of 

interest from a large library of natural, synthetic or semi-synthetic proteins, "protein 
display" methods have been developed whereby recombinant proteins are produced 
physically linJced to their genes such that recovery of the proteins allows subsequent 
rapid recovery of the genes. Such methods include "in-vivo" display methods such as 

25 display on bacteriophage ("phage display"), bacteria and yeast, and include "in-vitro" 
display methods such as display on ribosomes ("ribosome display"). The recovered 
genes can be sequenced in order to determine the identity of the recovered protein or 
can be used to regenerate the recovered protein. If a library of genes is subject to 
protein display methods whereby proteins are selected for a particular characteristics 

30 such as binding to an antigen (for antibody variable regions), then at each selection 
round, the recovered genes will be enriched for those encoding proteins exhibiting 
such particular characteristics. Disadvantages of current "in-vivo" display methods 
include a limit to the amount of fimctional protein displayed (phage display is usually 
limited to polypeptides of less than 40kDa), the usual need to fiise the recombinant 

35 protein to a host protein (which may interfere with the fimction or binding of the 

recombinant protein), and an inability to vary the number of proteins displayed per 
display particle; the latter is also a problem with "in-vitro" display methods such as 
ribosome display. In addition, methods for the selection of proteins with particular 
characteristics such as binding to an antigen are limited due to the small sizes of the 

40 display particles such that methods such as fluorescence activated cell sorting (FACS) 
cannot readily be used. Thus, there remains a need for new methods to improve the 
isolation of proteins from complex mixtures, in particular to improve the isolation of 
antibody variable regions (Fv's) from complex mixtures of Fv's. This, the present 
invention provides for improved methods for isolation of proteins from complex 
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mixtures. In particular, the present invention combines the use of protein libraires 
generated from gene libraries with improvements in mass spectrometry and especially 
improvements in matrix-assisted laser desorption/ionisation time-of-flight (MALDI- 
ToF) spectrometry and the ability to directly sequence ToF-separated peptides by 

5 tandem mass spectrometry (MS-MS) and, more recently, the ability to combine ToF 
and MS/MS into one device (Q-ToF) and the ability to combine HPLC and electron 
spray (ES) tandem mass spectrometry. The present invention also includes new 
methods for screening for individual proteins from complex protein mixtures whereby 
these proteins are not "displayed" i.e. bound to their corresponding genes either during 

10 or after binding to the target. The present invention also includes new methods for 
screening for individual proteins from complex protein mixtures whereby neither the 
proteins nor the target are "displayed" i.e. bound to any other molecule or structure. 
The present invention also includes new methods for screening for individual proteins 
from complex protein mixtures whereby the proteins and their corresponding genes 

15 are linked together via the addition or inclusion of an "associating moiety" whereby 
the proteins bind to the target either before or after addition of the "associating 
moiety". 

Thus, in a first aspect, the present invention provides A method of protein 
20 identification, screening and/or sequencing comprising providing a library of 
individual proteins, one or more of which may bind to a target of interest, wherein 
each individual protein includes in its sequence a "barcode" sequence, which can be 
used to identify each individual protein in the library. 

25 This aspect of the present invention provides for libraries of proteins, especially 

recombinant antibody domains such as Fv's, whereby individual protein members of 
the library include, within their amino acid sequence, a tract of sequence (a 'Taarcode") 
which can subsequently be sequenced in order to identify which protein(s) has bound 
to the specific target (or, in the case of Fv's, "antigen"). This embodiment will apply 

30 especially where the Fv's are derived from hmnan genes whereby the selected Fv may 
be suitable for homian therapeutic or diagnostic use. In this particular application, an 
extensive gene library of Fv's is created from a pool of immimo globulin cDNA's such 
as those derived from peripheral blood B cells in humans or such as pools created 
synthetically using human variable regions with semi-randomised ("combinatorial") 

35 CDRs (complimentarity-determining regions) at one or more positions. If this gene 
library is created in such manner that a random (or semi-random) gene sequence is 
included within the Fv coding region or terminal to this region, then such a 
random/semi-random gene sequence will generate a random/semi-random peptide 
sequence associated with individual Fv's. Such a random/semi-random gene sequence 

40 is created using standard methods such as oligonucleotide priming/DNA polymerase 
extension or PGR whereby a random/semi-random synthetic ohgonucleotide sequence 
is used as one of a pair of primers used to amplify immunoglobulin gene fragments 
during the creation of the Fv gene library. If members of the Fv library comprise two 
chains (i.e. heavy and Ught chain-derived chains (VH and VL)) as opposed to a single- 
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chain (VH and VL joined by a peptide linker), then individual barcodes can be 
associated with each of the chains (or can be associated with one of the chains only). 
Upon creation of the library, the resultant Fv's each include one or more "peptide 
barcodes" unique to that particular Fv or to a small subset of Fv's from within the 
5 complex library. Preferably, the peptide barcode is C terminal to the single-chain Fv 
region or C terminal to the vil or VL or both and includes, flanked between itself and 
the Fv region, one or more protease sensitive sites such as sites for enterokinase 
(cleaves after Asp- Asp-Asp- Asp-Lys, Factor Xa (cleaves after Ile-Glu/Asp-Gly-Arg) 
or other endopeptidases. If a mixture of such Fv's is produced from a suitable gene 

10 library, then this mixture is mixed with a target antigen (or antigens such as on cells), 
usually where the antigen is immobilised. This results in specific Fv's binding to the 
target antigen with non-binders (or weak binders depending on the stringency of 
washing) being washed away. Having washed away excess antibodies, the remaining 
antigen/Fv complex is then usually released from the Fv by digestion with the 

15 endoprotease used to cleave the infroduced protease sensitive site. This released 

barcoded peptide is then subjected to mass analysis / mass spectrometry sequencing 
either dfrectly or, if desired, following capture by virtue of specific amino acids or 
amino acid sequences which allow the peptide to be captured onto a solid phase such a 
cysteine residues which can be biotinylated for subsequent capture on immobihsed 

20 avidin or sfreptavidin. Alternatively, any other method can be employed to determine 
the sequence of the peptide barcode either within the Fv or after release including 
using specific ligands which bind to the barcode in a sequence-specific manner. 
Having determined the sequences (or part-sequence) of barcodes derived from bound 
Fv's, correspondiug synthetic oligonucleotides are then produced and used to 

25 specifically ampUfy or enrich for specific Fv genes from the library. These specific or 
enriched Fv (or VH and VL) genes are then fiirther used to generate corresponding 
Fv's which could then be retested for antigen blading either individually or as part of a 
small pool of isolated Fv's. Ultimately, by this method, specific Fv's can be generated 
with desirable antigen binding properties and, if from a human source, potential 

30 clinical utility. This aspect also encompasses the use of multiple barcodes associated 
with individual proteins or Fv's, for example two adjacent barcodes at the C terminus 
of Fv's whereby two peptides are released from each Fv by protease digestion, either 
simultaneously in order to enhance the identity of Fv's which bind to the target, or 
sequentially whereby different proteases are used in successive rounds of digestion to 

35 provide a different means to subsequently amplify Fv genes corresponding to Fv's 
which bind to tiie target. This aspect also encompasses the use of multiple barcodes 
which are analysed at the same time in order to increase the diversity of overall 
barcode sequences to provide specific coding of individual proteins. This aspect also 
encompasses the use of barcodes within indi vidual proteins, for example within one or 

40 more CDR positions of an Fv. This aspect also encompasses the use of proteases 
which might also digest the protein components of the protein :target mixture or, 
additionally, any protein agent used to immobilise the target, with the proviso that the 
barcode peptides released from the boimd test protein(s) can still be detected and 
sequenced within the background of other peptides. In the preferred format of this 
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aspect, a single region of barcode is provided at the C terminus of the light chains 
forming a soluble Fab fragment whereby VHs and VLs are encoded by the same 
expression cassette or cistron such that the barcode sequence can be used to access 
both VH and VL genes. Such an Fab fragment can be conveniently produced using a 
range of expression systems, for example the Ml 3 bacteriophage vector system where, 
by introduction of secretory leader sequences, the heavy and light chains of Fabs are 
secreted into the periplasmic space of the host bacteria and harvested from that space. 
The vector system is first prepared with in-frame barcodes by cloning in mixtures of 
synthetic oligonucleotides. For the formation of two adjacent barcodes, this is 
conveniently undertaken by sequential cloning or oligonucelotide mutagenesis 
whereby pooled M13 recombmants containing the first mixed barcode are prepared as 
a template for subsequent cloning of the second in-frame barcode. Preferably, the 
barcoding is designed such that the encoded protein contains endonuclease sites both 
flanking and between the two barcodes and also whereby a "spacer" region adjacent to 
one of the barcodes creates a peptide including that barcode which has a higher 
molecular weight than the other barcode. By judicious design of barcodes and the use 
of multiple barcodes in this manner, there is provided an option to simply analyse 
masses of endoprotease-released peptides by, for example, MALDI-ToF whereby the 
sequences of the peptides can be deduced (or near deduced) such that synthetic 
oligonucleotides can be designed to isolate (or enrich) for the specific proteins with 
the barcode(s) detected by MALDI-ToF analysis. Such deduction of these sequences 
is achieved by design of sequences whereby specific amino acids only occur in one or 
two positions along the peptide. For example, where the peptide is designed using 17 
of the 20 natural amino acids (hereby designated A-Q), then the sequences might be 
designed with options for any of three amino acids at each position along the peptide 
sequence as follows; 



aa position: 


1 


2 


3 


4 


5 


6 


7 


8 


amino acid 


















30 options: 


A 


C 


E 


G 


I 


K 


M 


0 




B 


D 


F 


H 


J 


L 


N • 


P 




C 


E 


G 


I 


K 


M 


O 


Q 



This design would give a theoretical 6561 different peptide sequence barcodes. If an 
35 adjacent barcode with a spacer region is also designed on the same basis, then this 

would give an additional 6561 different barcodes. In combination, this would create 4 
X 10^ barcode sequences which would be adequate to uniquely tag most members of a 
protein library of such size. The use of additional adj acent barcodes or longer 
barcodes based, for example, on use of two specific amino acids at any position in the 
40 sequence (thus creating 262,144 different barcode sequences using 19 amino acids) 
would increase the diversity of barcodes provided. In practice, codon redundancy is 
reduced through the judicious choice of codons at each position in the sequence during 
design of mixed synthetic oligonucleotides. One design of oligonucleotide for an 8 
amino acid barcode peptide for MS/MS sequencing is as follows; 
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Codons - 


NAC 


NCC NGG 


NTG TKC 


VAG 


GNV 


CNT 


5 


Amino acids - 


N 


T R 


L F 


Q 


D 


H 






D 


P G 


M C 


E 


V 


L 






H 


A W 


V 


K 


A 


P 






Y 


S 






G 


R 


10 


where codons 


N = 


A, C, G 


or T 












K = 


G or T 














V = 


A, C or 


G 









4X4X3X3X2X3X 4X 4 = 13824 barcode sequences 

15 

Specific codons can also be incorporated by discontinuous oligonucleotide synthesis 
whereby specific codons are added sequentially to separated mixtures of previously 
synthesised oligonucleotides ("codon mutagenesis"). Once a candidate barcode 
sequence is deduced by MALDI-ToF or MS/MS and where the diversity of individual 

20 barcodes is less than that of the library, the corresponding specific oligonucleotide (or 
mixture of oligonucleotides if there is redundancy in codon usage) can be used as a 
PCR primer in conjunction with an opposite primer designed from the protein or 
vector system to enrich for genes encoding the protein from which the barcode was 
detected. Where adjacent barcodes are used, a second primer nested within a gene 

25 fragment created from the first primer can then be used to enrich for the gene encoding 
the actual protein detected. If required, the above method can incorporate three or 
more barcodes in order to increase the specificity of oUgonucleotide-directed 
enrichment of specific genes encoding the desired protein. It will be understood by 
those skilled in the art that this first aspect of the invention can cover a number of 

30 variant methods with the underlying principle that a specific protein is recovered from 
a library of such proteins via mass or sequence analysis of one or more peptides 
associated with or encoded by that specific protein and, as such, that this aspect has a 
broad utility in isolating genes encoding desired proteins where only peptide sequence 
is determined or deduced. 

35 

It vdll be understood by those skilled in the art that, within the scope of the present 
invention, there are many variations of the first aspect. For example, it will be 
understood that peptide barcodes could be incorporated into pairs or groups of proteins 
which are then allowed to bind in order to determine which proteins binds to each 
40 other by virtue of detecting barcodes from each of the proteins engaged in binding. As 
an alternative to isolation of proteins from complex mixtures, proteins within these 
complex mixtures which demonstrate certain binding properties such as binding to 
other macromoles such as DNA can be detected using the present method. The 
present method includes a variety of ways for adding peptide barcodes to proteins 
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including methods where the barcode is encoded within the gene fragment encoding 
the protein. However, barcodes can be added to such protems or to any other suitable 
mixture of molecules by direct attachment of peptides. For example, specific peptides 
can be added to specific antibodies or proteins using a range of chemical or 
5 photochemical methods. One appUcation of such a method is to label one complex 
mixture of proteins with one barcode (or selection of barcodes, for example with 
different protein specificities) and the other barcode to an alternative complex mixture 
of proteins, for example to differentially barcode proteins from two different samples 
which are then mixed. It will be understood by those skilled in the art that the 

10 principle of adding peptide barcodes to proteins or other molecules could also be 

applied to non-peptide barcodes whereby such barcodes can be directly identified (or 
nearly identified) using mass or sequencing methods. As such, the barcodes could 
include nucleic acid barcodes attached to proteins or other molecules including nucleic 
acids. As with peptide barcodes, such nucleic acid barcodes can be analysed by mass 

1 5 spectrometry to provide an accurate estimate of mass. Such barcodes might be 

released from the proteins or other molecules usmg restriction enzymes instead of 
proteases. 

It will be understood by those skilled in the art that, within the scope of the present 
20 invention, there are applications of the first aspect other than in the isolation of 
proteins. For example, the distribution of proteins or other ligands within a live 
organism can be analysed by analysis of barcodes by mass or by sequence which are 
associated with specific organs within the organism. In the analysis of peptide or 
protein binding specificity to other molecules, barcodes can be constructed as part of 
25 the peptide or protein binding regions in order to analyse specificity by mass or 

sequence analysis of barcodes. For example, mixed peptide barcode sequences can be 
constructed around known anchor residues of MHC moelcules and the spectrum of 
peptides which bind to specific MHC molecules then determined by elution and mass 
or sequence analysis of the barcode. 

30 

In a second aspect, the present invention provides A method of screening a protein 
library comprising screening said hbrary for one or more desired properties, followed 
by derepUcation to identify one or more individual proteins in the library having the 
desired property. 

35 

This aspect of the present invention provides for libraries of proteins, especially 
recombmant antibodies such as Fv's, whereby individual members of the libraries are 
isolated for binding to specific targets whereby pools of proteins from the library are 
screened individually and then positive pools are subjected to one or more rounds of 
40 derepUcation until the individual proteins in the library which bind to the target are 

identified. Specifically, this aspect relates to screening protein libraries without use of 
a display system i.e. where there is either no physical association of the proteins with 
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corresponding genes. In this aspect, pools of proteins are screened for binding to the 
target whereby either the target is labelled to indicate which pool(s) contain proteins 
which bind, or where the target is detected without labelling. A particularly favoured 
method is to screen pools of proteins in solution without any fusion or attachment to 
5 other moieties (which might influence the binding of proteins to their targets) and then 
to precipitate the total protem pool (together with any attached target) prior to mass 
analysis, especially via MALDI-ToF, in order to screen for a "fingerprint" of ionised 
peaks which is representative of the target and therefore indicates if the target has 
bound. Once one or more positively-binding pools are identified, these can be then 

1 0 dereplicated either to reduce the complexity of the pool or to segregate out individual 
proteins for screening. for binders to the target. In practice, a particularly favourable 
way of assembling pools of proteins is to firstly to assemble pools of genes encoding 
these proteins. If genes are cloned into plasmid or phage vectors for example, these 
can be pooled by mixing together individual bacterial colonies or plaques, or more 

1 5 conveniently by segregating pools of colonies/plaques by plating onto separate agar 
plates (at densities such as 1000 colonies/plaques per plate) and scraping/eluting 
colonies/plaques from these plates into one mixture which is then used for synthesis of 
the proteins either through bacterial/phage expression or through in vitro 
transcription/translation. In a similar maimer, other microorganisms or in vitro 

20 synthesis systems could be used for synthesis of proteins. This aspect also 

encompasses the use of complex targets such as mixtures of molecules, whole cells or 
cell membranes whereby the molecular target yields a mass analysis "fingerprint" 
which is characteristic for binding to a specific molecular target within the complex 
target. This aspect also encompasses, where the target is a protein, the use of proteases 

25 to digest the target(s) in order to produce a peptide mass fingerprint indicative of the 
target and which, where the protease also digests the protein(s) from the library, can 
still be detected even within a background of other peptides derived from the library. 
This aspect also encompasses a range of different types of *target" and criteria for 
selection of pools of proteins or individual proteins other than by binding to a target. 

30 For example, the aspect encompasses the use of biological assay systems as a criteria 
for selection of proteins, for example where proteins are selected for the ability to 
stimulate or inhibit a biological activity. Other formats of binding assays would 
include inhibition of binding of a hgand to its receptor and selection for proteins which 
bind to certain locations on a target where the target might be, for example, a 

35 molecule, cell or tissue section. 

In a third aspect, the present invention provides A method of protein identification 
and/or sequencing comprising providing a Ubrary of individual proteins, one or more 
of which may bind to a target of interest, wherein each individual protein, together 
40 with its gene, is bound to an "associating moiety". 

This aspect of the present invention provides for libraries of proteins, especially 
recombinant antibodies such as Fv's, whereby the proteins and their corresponding 
genes are linked together via the addition or inclusion of an "associating moiety" 



wo 00/57183 



8 



PCT/GBOO/01015 



whereby the proteins or Fv's bind to the target either before or after addition of the 
"associating moiety". The associating moiety serves the purpose of enabling 
regeneration of the proteins or Fv's via the associated corresponding gene, for 
example by PGR amplification (or other means of amplification such as via bacterial 

5 transformation, or by direct sequencing and subsequent regeneration via this 

sequence). Where the proteins or Fv's are generated as a pool with a corresponding 
pool of genes, then genes associated with the proteins or Fv's which bind to the target 
(or which do not bind if so desired) are used as the basis for regeneration of individual 
or smaller pools of proteins or Fv's in order to repeat screening to identify the specific 

10 proteins or Fv's (via the corresponding genes) which bind to the target. 

A particular format for this third aspect where the associating moiety is a particle and 
whereby recombinant proteins and their corresponding genes are co-immobiUsed on 
particles whereby recovery of an individual particle provides for identification of the 

15 gene or genes encoding the recombinant protein. This format particularly relates to 

methods whereby genes encoding the recombinant proteins are co-immobilised on the 
same particle as their corresponding proteins such that upon selection for the 
recombinant protein, the corresponding gene will also be selected such that the 
identity of the selected protein can be determined (by sequencing the gene) or such 

20 that fiirther recombinant protein can be generated fi-om the gene. The method of the 

invention include provisions to control the amoimt of proteins displayed on the particle 
conMnonly by controlling the number of moieties on the particle to which the 
recombinant proteins bind. The invention includes provisions to co-display other 
molecules on the particle in conjxraction with the recombinant protein including other 

25 proteins or protein chains and including molecules to which the recombinant proteins 
bind such as antigens. 

In the basic operation of the third aspect of the present invention, there is provided an 
array of genes or mixtures of genes Jfrom which are synthesised recombinant proteins 

30 using methods such as in vitro transcription and translation or phage display, such 

proteins being exemplified by antibody variable regions (Fv's). Subsequently, genes 
and recombinant proteins are co-immobilised on particles, one or more Ugands are 
associated with the gene either as DNA or mRNA whereby such hgands become 
bound to a "receptor" on the particle surface or whereby such ligands are reacted with 

35 the particle surface to produce a covalent or ionic attachment. Alternatively, the gene 
is directly immobilised on the particle via formation of one or more covalent or ionic 
bonds to natural DNA or RNA reactive groups. The resultant recombinant proteins 
encoded by the genes may have one or more ligands associated (such Ugands being 
moieties on the proteins by which immobilisation can be achieved) such as protein 

40 sequence tags (encoded by the genes) or biotin groups (incorporated by in vitro 
transcription and translation using biotinyl lysine) such that they too can become 
bound to a "receptor" on the particle surface or whereby such ligands are reacted with 
the particle surface to produce a covalent or ionic attachment. The Hgands on the 
genes and proteins can either be the same or different ligands with immobilisation on 
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the same or different receptors. For useful operation of this aspect of the invention, 
genes or pools of genes either as DNA, mRNA or within a Uve microorganism such as 
a phage, are distributed into arrays (or multiple reaction vessels etc) and recombinant 
proteins are produced in such arrays (for example by in vitro transcription and 
translation or by growth of phage). Master arrays containing the genes can be used as 
the source of material for generating the recombinant proteins whereby samples of 
genes or proteins are dispensed into server arrays such that array locations for each 
gene or protein pool is preserved. Either before, during or after this process, one or 
more particles is introduced mto each position in the array providing receptors to 
which genes and proteins can bind. On one variation of the invention, the genes are 
attached to the particles at the outset and proteins produced directly from these genes 
such that these recombinant proteins are subsequently immobilised onto the same 
particle. Either before or following attachment of recombinant proteins to the 
particles, the proteins can be optionally subj ected to modification for example 
phosphorylation by other kinasfes or binding by other proteins. In a variation of the 
third aspect, the arrays include droplets such as oil-in-emulsion droplets or liposomes 
into which genes or live microorganisms are segregated (usually by producing the 
droplets prior to protein synthesis and thus arraying the genes within droplets). 
. Proteins are produced within the droplets and these are then co-attached to the 
particles including the genes. In the case of droplets, the particle to which the genes 
and proteins co-attach can either be introduced into the droplet or the particle can be 
the droplet itself For example, in the case of hposomes, the proteins could be 
produced with lipophilic tags which combine with the liposomes membranes 
especially where this leads to "display" of the proteins on the outside surface of the 
liposome. A related example is where in vitro translation of mRNA is used where 
microsomal membranes can be introduced in to the reaction whereby proteins with 
lipophilic tags can integrate into such membranes which can subsequently be 
dispersed into small particles. 

If it is desirable to then pool the particles for a selection process, particles are then 
retrieved from the arrays and mixed; the recombinant proteins on the particles are then 
subjected to selection, typically by exposure to a target which binds to selected 
proteins on the particles. Certain recombinant proteins could also be subjected to 
modification at this stage. Particles holding selected or modified proteins could then 
be retrieved by a variety of methods; for example, if the target is labelled with a 
fluorescent label, FACS could be used to separate out particles with (or without) the 
target. In the first major aspect of the present invention, genes encoding recombinant 
proteins on such selected particles could then be recovered by, for example, PGR 
amplification of the co-immobilised DNA or mRNA. 

There are many types of "associating moieties" for linking proteins with their 
corresponding genes which could be used in the third aspect of the present invention. 
Particles of use include latex and magnetic particles, and particles onto which 
synthetic oligonucleotides are synthesised directly. Such particles would commonly 
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be provided with a "recqjtor" to which the synthesised polypeptides can bind. Other 
associating moieties may be single molecules or molecular complexes which can act 
as a bridge to join the gene molecules to the synthesised proteins. For example, both 
the gene molecules and synthesised proteins can include biotin groups which could 
5 then be cross-linked by addition of streptavidin whereby streptavidin acts as the 

associating moiety. In a similar fashion, a sequence tag on the proteins and a ligand 
on the gene molecules can be cross-linked using, for example, a bispecific binding 
reagent such as a bispecific antibody (binding to both sequence tag and ligand) or an 
antibody-streptavidin conjugate (whereby the antibody binds either to a hgand on the 

1 0 protein or gene and the streptavidin binds to biotin on the protein or gene, whichever is 
non-liganded). Other associating moieties may be bacteria or bacteriophage whereby 
the synthesised polypeptide binds to a specific Ugand on the bacteria or bacteriophage. 
For example, an Ml 3 expression system can be used to produce a Fv fi-agment of a 
specific antibody in E.coli which can then bind to a specific protein antigen on the 

15 Ml 3 itself, especially where this is displayed on the phage head fiised to a capsid 
protein. By testing for Ml 3 phage to which Fv has bound, the gene encoding the 
specific Fv can be detemiined by sequencing the Fv gene encoded by the M13. 
Similarly, the Ml 3 expression system can be used to produce a protein which binds to 
a specific protein displayed on the M13 itself. In every case, the unique feature of the 

20 third aspect is that the recombinant protein molecules become attached, after 

synthesis, to the correspondmg genes via an associating moiety. Such attachment after 
synthesis especially allows for the unhindered synthesis of the protein molecules 
without, for example, the need to be synthesised as a fiision with other protein 
molecules which could alter the protein conformation or interfere with its recognition 

25 or ftmction. 

The present invention includes several methods to generate the recombinant proteins 
prior to linkage to the associating moiety. These methods especially include protein 
synthesis by in vitro transcription and translation, and protein synthesis in bacteria 

30 directed by plasmids or phage. In the latter case, the present invention provides the 
advantage that the generated protein need not be fiised with a phage protein as the 
generated protein in the present invention is subsequently immobiUsed onto a separate 
particle. In contrast to current methods for phage display of proteins where such 
proteins are fiised to a surface phage protein or protein which can reach the surface, 

35 the third aspect of the present invention would require either lysis of the phage or for 
secretion or leakage of the recombinant protein from the phage head in order to 
provide for its subsequent immobilisation onto the particle. Other in vivo methods of 
generating proteins such as expression in bacteria, yeast or even mammalian cells 
could thus also be used in the third aspect which therefore has the advantage of being 

40 more versatile than individual display methods. Thus, recombinant proteins could be 
modified by a particular host, for example glycosylated by mammalian cells, prior to 
immobilisation. One particularly usefiil aspect of the present invention is the ability to 
control the numbers of molecules of recombinant protein on the associating moiety, 
especially when this is a particle, by control of the number of "receptor" molecules on 
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the particle. In the case of antibody variable regions therefore, the valency of 
individual or pools of antibodies can be varied according to selection criteria. A 
further alternative associatmg moiety could be a live cell itself whereby the 
recombinant protein is linked to a ligand on or near the surface of the live cell such as 
5 a ceU surface marker of the bacterium or mammalian cell harbouring the expression 
plasmid or whereby, upon secretion, the protein would then bind to the cell from 
which it was expressed. The protein could then be reacted with target and cells 
harbouring the expression cassette for the specific Fv binding the target could be 
isolated. 

10 

The third aspect herein provides a particularly useful means for selection of 
recombinant proteins which bind to a target or for selection of recombinant proteins 
which are modified by a specific treatment, for example by treatment with cell or 
tissue lysates. The method accordingly will prove especially useful for the molecular 

15 evolution of recombinant proteins whereby successive rounds of selection ensure 

recovery only of proteins with stringent properties such as high affinity binding to a 
target. The method can also encompass successive rounds of mutagenesis of selected 
genes to maximise the diversity for evolutionary selection. It will be apparent to those 
skilled in the art that there are many variations which could be employed based on the 

20 third aspect of the present invention but falling within the scope of the present 

invention. For example, associating moieties especially particles used to capture the 
genes and recombinant proteins could themselves be bound by another polypeptide 
chain whereby, when protein-protein binding occurs, the recombinant protein is not 
captured by the particle directly but rather by the polypeptide chain already on the 

25 particle. An appropriate tag or Ugand on the recombinant protein can then be used to 
provide a means for detecting the protein-protein binding event. In the same manner, 
particles could be bound by synthetic oligonucleotides which are subsequently used to 
anneal to the genes as a means to capture them on the particles. 

30 In a fourth aspect, the present invention provides A method of protein identification 
and/or sequencing comprising providing a library of individual proteins, one or more 
of which may bind to a target of interest, wherein each individual protein is attached to 
an individual "coding moiety". 

35 In this aspect of the present invention, recombinant proteins synthesised from a gene 
library are subsequently attached to "coding moieties" such as particles which are 
distinguishable through one or other coding methods in such a manner that the coding 
relates to the identity of the gene which encodes a recombinant protein attached to the 
particle. Where the recombinant proteins are immobilised on coded particles, the 

40 recombinant proteins may have one or more ligands associated such as protein 

sequence tags or biotin groups such that they can become bound to a "receptor" on the 
coded particle surface or whereby such ligands are reacted with the particle surface to 
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produce a covalent or ionic attachment. In the operation of this aspect of the present 
invention, recombinant proteins or pools of proteins are synthesised or segregated in 
large arrays. Particles with unique codes are then introduced into each position in the 
array. Such codes include, for Example, different ratios of measurable signalling 
5 moieties such as fluorescent, chemiluminescent or radioactive labels or different 

physical features which distinguish particles such as different shapes or markings, for 
example a code or imique mark etched into the particle. In each case, individual coded 
particles can be distinguished from each other. Particles of use in the present 
invention includes any such particles, complexes or molecules with the property that 

10 proteins can be attached. Following pooling of particles and binding of the mixtures 

of proteins on coded particles to a specific target, the coding of selected particles could 
then be determined in order to determine their original array positions and hence the 
array loci ofgenes encoding the selected recombinant proteins. As a variation of these 
aspects of the invention, selected proteins on particles could be identified directly 

15 using methods such as MALDI-TOF (mass spectroscopy) or using labelled antibodies 
to identify known proteins. The operation and scope of this fourth aspect of the 
present invention wiU share many aspects and scope of the above third aspect of the 
invention. 

20 35. In a fifth aspect, the present invention provides A method for analysing 

mixtures of proteins comprising: 



(i) digestion or cleavage ofthe protein mixture; 

(ii) fi^ctionation of the resultant peptides; and 

25 analysis of the resultant peptides by means of their mass and/or sequence. 

This aspect of present invention relates to methods for analysing mixtures of proteins. 
In particular, the invention relates to methods to compare proteins between different 
cells and tissues. The invention involves the combination of digestion or cleavage of 

30 protein mixtures, fractionation of peptides using a library of protein binding reagents, 
and subsequent analysis of peptide fractions for mass or sequence. The invention 
includes optional physical fractionation of proteins or peptide fragments additional to 
fractionation with protein binding reagents. Current methods to analyse e/i masse 
complex mixtures of proteins such as in mammalian cells or tissues require that the 

35 proteins are separated by technologies such as two dimensional (2D) gel 

elecfrophoresis. For this technology, cellular proteins are usually separated on the 
basis of charge in one dimension and on the basis of size in the other dimension. 
Proteins can either be identified with reference to the electrophoresis migration pattern 
of a known protein or by elution of the protein from the elecfrophoretically separated 

40 spot and analysis by methods such as mass spectrometry and nuclear magnetic 

resonance. However, limitations ofthe 2D protein gel method include the limited 
resolution and detection of proteins from a cell (typically only 5000 cellular proteins 
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are clearly detected), the limitation to identification of separated proteins (for example, 
mass spectrometry usually requires lOOfinoles or more of protein for identification), 
the specialist nature of the technique and the difficulty in automating the technique in 
order to achieve very high protein analysis throughputs. There is thus a need for 
5 superior methods to analyse complex mixtures of proteins en masse especially using 
methods without gel electrophoresis and methods which are easy to automate. 

The core of the fifth aspect is that proteins are either digested or cleaved into smaller 
peptide fragments and tlien fractionated using a library of protein affinity reagents and 
LO then subjected to mass analysis especially by mass spectroscopy. Optionally, proteins 
or peptide fragments may be fractionated physically in addition to being fractionated 
with protein affinity reagents and may also be conjugated with one or more "chemical 
tags" to assist in fi^ctionation. 

15 The major aspect of the fifth aspect provides for cleavage of proteins using proteases 
or chemical methods; fractionation of the peptide mixture thereby produced and 
subsequent mass analysis. Fractionation of peptides is achieved using protein affinity 
reagents, especially libraries of recombinant antibody fragments. Optionally, the 
method includes additional fractionation of proteins or peptides using physical 

20 methods or specific affinity reagents such as antibodies or soHd phases or reactive 
chemical groups to isolate peptides or mixtures of peptides for subsequent mass 
analysis. Protein affinity reagents are used to retrieve individual peptides or sets of 
peptides from the peptide mixture for subsequent mass analysis. Alternatively or 
additionally, protein affinity reagents can be used to eliminate peptides from the 

25 mixture whereby the mixture is itself subsequently subjected to mass analysis. ■ The 

protein affrnity reagents can either bind by virtue of specific sequences or structures in 
peptides or by virtue of specific chemical groups either as natural constituents of the 
peptides or as chemical tags which are added to the peptides either before or after 
cleavage. 

30 

For analysis of larger mixtures of peptides, panels of protein affinity reagents such as 
those provided by recombinant libraries of antibody Fv fragments (including single- 
chain Fv's) can be used in order to isolate subsets of peptides for subsequent analysis. 
Such panels of Fv's will include a wide range of peptide specificities which could be 

35 achieved, for example, by pre-absorbing antibody libraries on the peptide samples of 
interest or by immunising animals with peptide samples of interest and generating 
recombinant Fv libraries from the animal B cells. Alternatively, polyclonal antisera or 
panels of monoclonal antibodies could be prepared from immunised animals and used 
to fractionate peptides. Then individual or mixtures of the selected antibodies are used 

40 to isolate (or eliminate) the specific subsets ofpeptides from a test sample. 

Subsequent mass analysis of a range of peptides can facilitate the detection of 
differences in specific proteins between test samples. 
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Generation of recombinant Fv's or antibodies to all peptides in a mixture is difficult 
and is highly dependant on the number of peptides in a mixture and the facility for 
individual peptides to be bound with reasonable affinity to antibodies ("antigenicity"). 
With a very large peptide mixture, a limitation is redundancy whereby antibodies with 
5 the same peptide specificities are repeatedly represented whilst antibodies to other 
peptide specificities are underrepresented or absent. This may cause a particular 
protein not to be mass analysed if none of the peptides from a particular protein are 
bound by an antibody. Therefore, a particularly usefiil method is to isolate N or C 
terminal peptides (or both) from a protein by preabsorption of the protein to a solid 

1 0 phase via its N and/or C terminus prior to cleavage or by chemical tagging of the N 
and/or C terminus for subsequent isolation after cleavage. In principle, this then 
should lead to recovery of all N and/or C terminus peptides representing all proteins 
from the sample. Such isolation of N and/or C terminal peptides is greatly facihtated 
by the differential reactive nature of the N terminal amino group and the C terminal 

1 5 carboxyl group in the protein compared to internal amino and carboxyl groups. Such 
isolated N and/or C teraminal peptides can be fiuther fractionated using other affinity 
reagents which either recognise specific peptide sequences or which recognise 
chemical tags on the peptides or further fractionated by physical means such as HPLC. 
Such isolated N and/or C terminal peptides are then fractionated using protein affinity 

20 reagents prior to mass analysis. The invention also allows for sequential conjugation 
of different chenucal tags to the protein / peptide mixture especially where N or C 
termini are sequentially exposed by specific cleavage of the protein / peptide and 
whereby the N or C termini (or both) are conjugated with a specific chemical tag upon 
exposure of that termini. This aspect of the invention therefore provides for a series of 

25 protein fractions with a range of conjugated chemical tags introduced at the termini, 
such fractions being isolated using an affinity reagent which binds to the tag. As a 
particularly useful method as an alternative to a chemical tag at the terminus of the 
protein molecule, chemical tags can also specifically be attached to non-terminus 
amino acids such that internal peptides can be isolated via an internal chemical tag. 

30 Unique chemistries are available for attachment of ligands to several specific amino 
acids, for example to the s -amino groups of lysines, the thiol groups of cysteines and 
the carboxyl groups of aspartic and glutamic acids. One advantage of isolating 
peptides by virtue of non-terminal tags is that selection can be made for larger peptides 
which are more likely to contain a specific amino acid to which a tag is attached thus 

35 isolating peptides with a mass which exceeds low molecular weight masses with a 
larger backgroimd noise during mass analysis. Another advantage is the array of 
reagents afready available to introduce chemical tags onto specific amino acids within 
proteins or peptides especially reagents which provide a biotin tag. 

40 Another embodiment of the fifth aspect provides for sequential cycles of protein 

cleavage using proteases or chemical methods with fractionation with protein affinity 
reagents either during or following successive protein cleavage steps and subsequent 
mass analysis. In this case, the analysis of protein mixtures is assisted by sequential 
cleavage cycles whereby the spectrum of proteins and peptides are fractionated with 
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the protein affinity reagents and analysed following each cleavage cycle. This method 
could also include chemical tagging cycles between cleavage cycles to increase the 
mass or steps to remove side-groups such as carbohydrate groups in order to reduce 
mass. If the mass of the range of protein fragments is then determined at the end of 
5 each cleavage cycle (either with or without chemical tagging, cleavage or other 

modification), then a range of mass distributions will be obtained for each cycle. With 
an appropriate series of mass modification cycles, the result for a single protein or a 
mixture will be a mass spectrum of protein/peptide fragments which is altered at 
successive cycles; the pattern of these alterations will provide a "fingerprint" for the 

10 specific proteins/peptides in the mixture. The appearance and disappearance of a 
particular protein/peptide fragment of a certain mass following a specific cleavage 
cycles with or without chemical tagging, cleavage or other modifications will provide 
a fingerprint for identification of the fragment sequence especially by reference to a 
database of such fingerprints. Comparison of the spectrum of protein/peptide 

15 fragments from different related samples then allows for the identification of 

protein/peptide fragment differences between these samples. Particularly usefiil in this 
aspect of the present invention is proteases which specifically recognise two amino 
acids and cleave the protein as a result. An example of such proteases are the 
prohormone convertases which cleave between dibasic amino acid pairs. Therefore, 

20 the fifth aspect of the present invention provides for novel ways of analysing protein 
mixtures using a combination of protein digestion or cleavage, fractionation using 
protein affinity reagents and mass analysis. 

In a related aspect of the fifth aspect, proteins are firactionated prior to cleavage. For 

25 large protein mixtures, particularly those isolated directly from whole cells or tissues, 
the pre-fractionation of proteins may be desirable in order to reduce the complexity of 
mixtures subjected to subsequent cleavage, peptide fi^ctionation and mass analysis. 
Whilst protein affinity reagents which bind sequences or structures in the 
proteins/peptides directly are primarily usefiil, an alternative or an addition is to use a 

30 library of chemical tags to provide moieties boimd by a set of protein affinity reagents. 
More conventional means of pre-fractionation include the use of gel elecfrophoresis 
either in one or two dimensions where sections of the gel are isolated and the proteins 
within then subjected to cleavage and mass analysis. Other pre-fi:actionation methods 
include isolation of proteins by virtue of natural modifications such as 

35 phosphorylation, glycosylation, protein-protein (or peptide) interaction; altematively, 
membrane proteins can be pre-fractionated or proteins from particular compartments 
within the cell. Another important pre-fractionation procedure is to remove highly 
abundant proteins from the mixture using affinity reagents such as antibodies to bind 
and remove such proteins. As an alternative to pre-fractionation, peptides generated 

40 after cleavage can also be fractionated by many of these means and also including 

size/charge fractionation methods using HPLC. Such methods are particularly usefiil 
to fractionate peptides which have akeady been selected from a mixture through the 
apphcation of protein affinity reagents. In particular, HPLC can be interfaced with 
mass analysis such that peptide fractions from HPLC separation are directly subjected 
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to mass analysis. Peptides generated after cleavage can also be fractionated by virtue 
of natural modifications using, for example, antibodies which bind phosphorylated 
amino acids Avithin peptides. Prefractionation of proteins may also be achieved by 
using protein affinity reagents such as monoclonal/polyclonal antibodies to isolate 
5 specific proteins for subsequent cleavage and mass analysis. For such analysis of 
larger mixtures of proteins, libraries of antibodies such as those provided by 
recombinant libraries of Fv's are preferred in order to isolate subsets of proteins or 
subsets of cleaved peptides for subsequent mass analysis. Such library of antibodies 
Avill include a wide range of protein or peptide specificities but can also be pre- 

1 0 enriched for binding to proteins/peptides of interest in the particular sample of interest. 
For peptides, this is preferably achieved by testing individual Fv's for selective 
binding to a single or a small number of peptides in the sample. Alternatively, pre- 
enrichment can be achieved by pre-absorbing antibody libraries on the mixed 
protein/peptide sample of interest and then using individual or mixtures of the selected 

15 antibodies in order to isolate subsets of proteins or peptides. Fractionation with 

protein affinity regents provides mass spectra for a range of different protein/peptide 
fractions thus facihtating detection of differences in specific proteins between 
samples. 

20 A further advantage of the use of chemical tags is that the subsequent fi-actionation of 
peptides by affinity reagents can greatly reduce the number of selected peptides from a 
protein molecule with the rest of the molecule thus being eliminated from the mass 
analysis. An especially convenient method for selective chemical tagging is to tag 
either (or both of) the N and C terminus of the protein molecules in the mixture and 

25 then to digest or cleave the protein molecules with a reasonably selective reagent such 
as a amino acid or sequence-specific protease (such as endopeptidase Arg-C) or 
cleavage reagent (such as acid pH to cleave at Asp-Pro). Using an affinity reagent, N 
or C terminal peptides (or both) from the original protein could then be isolated and all 
internal peptides discarded. This reduction in complexity is then sufficient for mass 

30 analysis especially using HPLC coupled to a tandem mass specfrometer to analyse the 
peptides en masse in order to identify the individual peptides from the mixture. 

Alternatively, chemical tagging could be performed only after digestion/cleavage, for 
example with the dibasic cutters, the prohormone convertases. This would provide for 
35 tagging only at one or more internal sites of the original proteins. If the protein 

mixture is then subjected to a second digestion/cleavage step with a different enzyme 
or cleaving reagent, then the size of the tagged peptides would be reduced where a 
cleavage site was present in the original protein. The tagged peptides could then be 
fractionated using protein affinity reagents and subjected to mass analysis. 

40 

In another embodiment of the fifth aspec, a protein mixture is subjected to cycles of 
tagging, digestion/cleavage and mass analysis, whereby fractionation by protein 
affinity reagents and mass analysis is performed only on an aliquot of the mixture 
resultant from use of an affinity reagent binding to the specific chemical tag and 
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whereby the master mixture is then subjected to tagging with a different chemical tag 
and digestion/cleavage. This provides sequentially a range of different fragments. 
Another variation on the method involves the same initial steps as above but, having 
exposed new N and C termini after cleavage, one (or both) of these new termini can 

original protein. If required, the process could be repeated one or more times with a 
different protease or cleavage reagent, each time with the addition to the N or C 
terminus of a different chemical tag. In one format of the method, the whole mixture 
of proteins would first be tagged with two different chemical groups at each of the N 

10 and C terminus and then cleaved with a protease, such as one which specifically cuts 
adjacent to a specific amino acid, and tagged again at the new N and C termini with 
two further different chemical groups. This would result in a mixture of peptides each 
with chemical tags at the termini. As the N and C terminal peptides would have a 
specific tag, these could then be isolated firom the mixture using appropriate affinity 

15 reagents. Internal peptides without either the initial N or C terminal tags could be 

isolated using their specific tags. The process of digestion and tagging could then be 
repeated to create further peptides with tags. Using specific combinations of affinity 
reagents for specific tags, N or C terminal or specific internal peptides fi-om the 
original protein could then be isolated and selected peptides discarded to achieve a 

20 reduction in complexity. Where chemical tags are added to two or more amino acid 
side groups within peptides, sequential use of affiboity tags could isolate flections of 
peptides containing specific combinations of amino acids. For example, if a mixture 
of peptides of average length of 20 amino acids and separately tagged at lysine and 
phenylalanine and the mixture comprises 25% of peptides which include neither lysine 

25 or phenylalanine, 25% with lysine only, 25% with phenylalanine and 25% with both, 
then the separate or sequential use of specific affinity reagents either for lysine or 
phenylalanine will result in firactionation of peptides mto four equal firactions. In 
practice, such a fi^ctionation scheme will favour the binding of larger peptides to 
affinity reagents as these peptides are more likely to contain one or more of the 

30 specific amino acids tagged. This will bias against the very small peptides such as 
those with molecular weights less than 1000 daltons which, when subjected to mass 
spectrometry analysis, will be more likely to coincide wdth background noise due to 
fi-agmented peptides and other small molecules. 

35 Where analysis of complex protein mixtures is required such as in mammahan cells or 
tissues, the present invention provides a main method where proteins are firactionated 
using protein affinity reagents either before or after cleavage and the peptides are then 
mass analysed. The fi-actionation of a complex mixture of proteins or peptides 
requires a correspondingly complex mixture of protein affinity reagents and can be 

40 assisted by one or more additional affinity reagents which can recognise features of 
the proteins/peptides which are the basis for fi-actionation. Where cleavage is 
conducted prior to fractionation, the most common method used in the present 
invention is to cleave the whole protein mixttire with a protease such as trypsin or V8 
(Glu-C) protease and to then selectively isolate and mass analyse certain peptides. 
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Commonly, N or G terminal pq)tides (or both) from the peptide mixture are isolated 
typically by adding a chemical tag to the N and/or C terminus of the proteins prior to 
cleavage and using an affinity reagent which isolates peptides with the chemical tag. 
Alternatively, specific peptides (N / C terminal or otherwise) can be isolated using 

5 affinity reagents which have been selected for binding to specific peptides within 

specific proteins; these will then select out those peptides firom the mixture. For more 
complex mixtures of proteins, a fiorther fractionation step such as HPLC firactionation 
based on size, charge or hydrophobicity is preferred prior to mass analysis especially 
as this can be interfaced with mass analysis. Selective isolation of peptides then 

1 0 allows for comparative analysis of specific peptides derived firom alternative protein 
mixtures for their relative quantities (relating to relative levels of the proteins in their 
respective mixtures) and, in certain cases, for modifications of the peptides. 

For firactionation of N or C terminal peptides, the preparation and use of protein 

15 affinity reagents is an important aspect of the present invention and the labelling of the 
N or C terminus of proteins is another important aspect. With a typical mixture of 
proteins fi-om mammalian cells or tissues or from many hving organisms, several of 
the N termini of these proteins (and some C termini) will be modified (for example, by 
methylation) such that addition of a chemical tag to the terminus may be blocked. In 

20 addition, a typical mixture of proteins fi-om mammalian cells or tissues or fi-om many 
hving organisms, the proteins will occur at different relative levels of abundance 
including, commonly, certainly highly abundant proteins. Where protein mixtures 
from mammalian cells or tissues or firom other living organisms are used for the initial 
selection of protein affinity reagents, such highly abundant proteins may dominate 

25 selection of affinity reagents and may be predominant in the fibial peptide mixture for 
mass analysis. A solution to both of these problems is to use an artificial source of 
mixed proteins to isolate the affinity reagents. Typically, this will be a gene 
expression system whereby a gene (usually cDNA) library is used to generate the 
proteins without N or C terminal modifications. In addition, the use of a gene 

30 expression system allows the gene Ubrary to be "normaUsed" to reduce or remove 

highly abundant genes within the library. This is typically achieved by self-annealing 
of the DNA (or RNA) prior to constructing the library. Therefore, a common method 
in the present invention is to generate proteins by expression of gene libraries (usually 
normalised) resulting in proteins free from significant N or C terminal modifications 

35 and, where normalised, resulting in a protein mixture free from domination by specific 
proteins. A typical expression system used with gene Ubraries is in vitro transcription 
and franslation using a eukaryotic ribosome preparation; this also provides the 
possibility of incorporating modified amino acids into the expressed proteins. The 
expressed protein mixture can then be used directly for N or C terminal labelling. 

40 Otiier expression systems could also be used where N terminal amino groups or C 
terminal carboxyl groups are not modified or prevented from subsequent chemical 
tagging. Where modification occurs, in some cases the N terminal modification can 
be removed either using enzymes such as histone deacetylase or chemical methods 
such as limited cyanogen bromide cleavage to remove N terminal methionines. 
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Having produced a mixture of proteins free from N/G terminal modification, chemical 
tags can then he added to the N/C terminal amino group(s). For the N terminus, the e- 
amino group of lysines can be initially blocked using reagents such as citraconic 
anhydride or methyl acetimidate to then allow only the N terminal amino groups to 
5 react. Alternatively, the e -amino group of lysines can be blocked by incorporating 
modified lysines into the expression system such as in vitro transciption / translation 
whereby, for example, biotin-modified lysines can be directly incorporated instead of 
lysines. Chemical tags can then be added selectively to the N terminus of proteins, for 
example using isothiocyanates of specific molecules to which an affinity reagent is 

10 available. One such example is fluorescein which is incorporated by reaction of the 
proteins with fluoresceia isothiocyauate allowing subsequent purification with anti- 
fluorescein antibodies. Alternatively, polycarboxylic chelating agents can be 
incoiporated as isothiocyantes allowing subsequent purification with specific metals. 
Once the N and/or C termini of proteins in the mixture are tagged, the protein is then 

1 5 comprehensively and specifically cleaved either chemically or enzymatically, using 
proteases such as trypsin or another cleaving agent. Such cleavage thereby releases 
from each protein an individual tagged terminal peptide Segment, such collection of 
Augments which can then be purified from the mixture of untagged peptides using an 
appropriate affinity reagent such as an antibody specific for the chemical tag. If 

■20 required, the size of the chemical tag can be increased in order to produce a larger 

mass for analysis; this would be usefiil for peptide fragments resulting from cleavage 
very close to the chemical tag whereby the resultant fragment might be so small as to 
be mass analysed within lower molecular weight "noise". The chemical tag nught, for 
example, comprise a piece of nucleic acid attached to the peptide via a reactive group 

25 introduced during synthesis of the nucleic acid. Such a nucleic acid molecule might 
also be useful for isolation of the tagged peptide via annealing of the nucleic acid to a 
complimentary sequence. 

Following chemical tagging and isolation, the recovered mixture of N/C terminal 
30 peptides are then used as a "bait" for the isolation of protein affinity reagents to bind to 
these same peptides from proteins derived directly from mammalian cells or tissues or 
from other living organisms. Such affinity reagents will typically derive from a hbrary 
of recombinant Fv's displayed as part of a particle containing' the corresponding gene 
encoding the antibody. Examples of such particles are ribosome display particles or 
35 phage display particles, in each case where the genes from selected antibodies can be 
rescued in order to propagate those specific antibodies. As an alternative, large arrays 
of antibodies (such as recombinant single chain or Fabs, Fvs) can be screened using 
the N/C terminal peptide mixture and antibodies which display binding to the peptides 
can be recovered via the corresponding genes. As another alternative, N and/or C 
40 terminal peptides could be used to directly generate polyclonal or monoclonal 

antibodies by appropriate immunisation of an animal. By these means, a hbrary of 
protein affinity reagents is selected which can then be used for the analysis of mixtures 
of proteins such as from mammalian cells or tissues or from other living organisms. 
Such analysis can either involve using the library of affinity reagents to select out N/C 
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teraiinal peptides from proteins derived from mammalian cells or tissues or from other 
living organisms or using individual affinity reagents to select out individual peptides. 
The selected peptides can then be mass analysed typically by MALDI-ToF (matrix- 
assisted laser desorption/ionisation time-of-flight) where the individual peptides give 
5 individual charge:mass ratios which can then be used to identify the peptide amino 
acid constituents. MS-MS (double mass spectroscopy) peptide sequencing can 
subsequently be used to identify the peptide if it can be isolated. Alternatively, the 
new generation of Quadrupole-ToF LC-MS-MS ("Q-ToF") instilments can provide 
for sequential MALDI-ToF and MS-MS witiiin tiie same instilment. Indeed, protem 

1 0 affinity reagents either individually or in mixtures can be immobilised either indirectfy 
or directly onto the desorption chip inserted into tiie MALDI-ToF instrument and 
peptides can be subsequently bound via the affinity reagents on the chip. In this way, 
multiple peptide fractions adsorbed by multiple affinity reagents at different loci can ' 
be analysed on a single chip. The use of recombinant proteins as the "bait" to isolate 

1 5 protein affinity reagents also provides the prospect of attaching other tags to those 
proteins whereby the tags are encoded by the gene sequence; for example, a C 
tenninal polyhistidine tag (allowing subsequent purification of tiie tagged fragments 
using nickel chelates) could be incorporated, for example tiirough PCR-mediated 
incorporation into the gene sequences. 

20 

The use of recombinant proteins as tiie "bait" to isolate protein affinity reagents also 
provides anotiier common metiiod of the fifth aspect of the present invention for 
specifically isolating peptides using tags encoded by the recombinant proteins. Such 
tags can be conveniently incorporated into members of the a gene (usually cDNA) 

25 library during its constiiiction or into individual clones or groups of clones tiiereof 

using specific PGR primers encoding such tags and designed to incorporate such tags 
into the resultant expressed proteins. Preferably, such tags will be incorporated into 
the expressed proteins in all reading firames in order to produce a productively tagged 
protein. Such tags will preferably be incorporated via die downstieam primer of a 

30 PGR reaction witii the usual result tiiat tiie tag is produced towards the C tenninal end 
of die expressed protein (although upstream termination codons may prevent this in 
some clones). However, tags may also be incorporated at the N tenninal end or in 
both N and C termini. 

35 For the isolation of specific peptides from a peptide mixture, the peptide sequences 
can be produced syntiietically (or via recombinant DNA) and then, as above, used as 
the "bait" to capture specific protein affinity reagents. These affinify reagents can then 
be used to isolate these same peptides from a cleaved protein mixture derived from, 
for example, mammalian cells or tissues or from other Uving organisms 

40 

As an alternative to selectively fractionating N or C terminal peptides or specific 
internal peptides, modified peptides such as peptides including phosphorylated amino 
acids which can be isolated using antibodies which selectively bind to phosphorylated 
amino acids (tyrosine, tiireonine or serine or combinations thereof) or using 
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immobilised Fe3+ to trap negatively charged peptides. Similarly, peptides modified 
by glycosylation and other modifications can be isolated, in some cases where the 
peptide modification is fiirther derivatised in order to facilitate isolation. For example, 
carbohydrates can readily be modified via periodate reactions as an intermediate to 
5 adding chemical tags such as fluorescein. A particularly important aspect of the 

invention is the fractionation of selectively modified peptides whereby such peptides 
are selectively tagged by virtue of their differential exposure to tagging within the 
original protein environment prior to cleavage. For example, surface exposed proteins 
on living cells can be selectively tagged, for example with biotin, by treating the ceils 

10 with a tagging agent which preferentially reacts with specific amino acid groups. An 
indirect method for achieving such tagging in proteins which are naturally tagged via 
other stimuH within cells is to apply such stimuU in order to effect tagging of the 
proteins. For example, receptor-associated tyrosine kinase molecules witihin cells can 
potentially be tagged (for example, phosphorylated) by addition of the receptor ligand 

15 to those cells. Following modification, peptides are released fi-om proteins by 

cleavage and then directly mass analysed or subjected to fractionation with protein 
affinity reagents as above prior to mass analysis. 

Mass analysis of proteins and peptides by the present invention is preferably 

20 perfonned using mass spectroscopy. In particular, MALDI-ToF analysis has the 
capabiUty to very accurately measure specific mass: charge ratios for individual 
peptides. This method has the capability for simultaneous analysis if thousands of 
peptides. Above 4kD, the resolution of individual peptides (and proteins) beconaes 
poorer such that cleavage of proteins into peptide fragments is necessary in order to 

25 provide fine resolution. Recent methods of interfacing liquid chromatography 
separation methods (such as HPLC) with tandem mass spectroscopy has already 
permitted the mass spectrum analysis of protein mixtures comprising up to 200 
protems. As such proteins are analysed following protease digestion, if an average ten 
peptides per protein is assumed, then the method can analyse up to 2000 peptides. 

30 Using methods of the present invention whereby, for example, only tagged N terminal 
peptides are analysed, then up to 2000 N terminal peptides derived from up to 2000 
proteins could be analysed at any one time. As this is not sensitive enough for an en 
masse analysis of mammalian proteins from cells (typically 50,000 per cell), then 
peptides have to be segregated into at least 25 fractions in order for these fractions all 

35 to be analysed. Such further fractionation can be achieved either directly using a pre- 
selected library of protein affinity reagents, or by the use of reagents to label internal 
ends after successive protein digestion/cleavage steps following which specific protein 
affinity reagents are used to fractionate peptides according to their tags. As an 
alternative to standard mass spectroscopy, MALDI-ToF can be used to produce 

40 protein mass profiles which can be compared for protein mixtiu-es from different cells. 

Chemical tags are typically moieties which can be covalently attached to proteins 
usually at the N or C terminus. For chemical tagging of the N terminus, this is 
commonly undertaken at the terminal amine group. If it is necessary to avoid tagging 
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of the s -amino group of lysines, then these can be initially blocked using reagents 
such as citraconic anhydride or methyl acetimidate. Terminal amine groups are then 
reactive with a wide range of chemical reagents especially using isothiocyanates. 
Thereby, common antibody-recognised Ugands such as dinitrophenol and fluorescein 
5 can then attach these to the N terminus for subsequent firactionation using an antibody 
afiinity reagent. For example, the commonly used Edman reagent phenyl 
isothiocyanate can be used to specifically attach to the N terminus of proteins and can 
be derivatised if necessary with a moiety provided for subsequent binding to an 
affinity reagent. For chemical tagging of the C terminus, methods based on 

10 carbodiimide activation are commonly used to introduce ligands which are bound by 
affinity reagents. Alternatively, addition of moieties to the C terminus of proteins has 
been described using reverse proteolysis whereby certain proteases such as 
carboxypeptidase Y and lysyl endopeptidase can work in reverse to add chemical tags, 
commonly by way of amino acids either as derivatised amino acids with tags for 

1 5 binding to an afiSnity reagent or by way of natural sequences of amino acids which can 
then be specifically bound by an affinity reagent. It will be recognised that a wide 
range of intemal amino acids can also be chemically tagged including Lys via the e- 
amino group, Glu / Asp via the carboxyl group, Cys via the thiol group, Ser / Thr via 
the hydroxyl group and Tyr via the hydroxyphenyl group. Specific derivatisations of 

20 most other amino acids have been described. It will also be recognised that post- 
translation protein modifications can be used for addition of chemical tags especially 
with glycosylation where the sugar residues are commonly oxidised by periodate to 
formaldehyde groups which can then react with amine-containing molecules. Other 
modifications which can be used to add chemical tags include lipidation, 

25 phosphorylation and metal ion addition. It v/ill be recognised that there are a large 
number of methods in the art for introducing one or more chemical tags at specific 
sites within protein molecules or peptides. 

Protein affinity reagents for use in the fifth aspect are commonly monoclonal 
30 antibodies. For specific sequences or structures within proteins or peptides, a library 
of recombinant antibody binding sites usually in the form of Fab's, Fvs or single-chain 
Fv's is used where conmionly the antibody binding sites are "displayed" using, for 
example, bacteriophage or ribosome complexes such that the gene encoding individual 
antibody binding sites can be recovered. For use in the present invention, libraries of 
35 antibody binding sites can be dispersed into groups, for example by picking and 

arraying phage plaques or picking and arraying genes in vectors for ribosome display. 
Such pools will usually contain antibody binding sites for several proteins or peptides 
such that the pools can be used for fractionation. Alternatively, the protein or peptide 
mixture to which libraries of antibody affinity reagents are required can be 
40 immobilised and used as the target for the pre-selection of suitable affinity reagents 

which are then dispersed into pools or used as individual reagents. For chemical tags, 
individual monoclonal antibodies are used to specifically bind to individual tags in 
order to achieve subsequent fractionation. 
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The fifth aspect of the present invention includes the use of protein affinity reagents 
other than monoclonal antibodies where such reagents can facilitate the fractionation 
of peptides or proteins prior to mass analysis. Such affinity reagents would include 
molecules of the immune which selectively bind certain peptides such as major 
5 histocompatability proteins and T cell receptors. Other protein affinity reagents would 
include protein domains cormnonly involved in protein-protein binding interactions 
such as SHI domains. Included in the present invention is the concept of cycUsing 
peptides including within mixtures and especially when bound to solid phases by, for 
example, linking cysteine residues imder reducing conditions. One method for this 
10 would be to add an additional cysteine residue at an exposed N or C terminal on 
immobilised peptides using, for example for C terminal immobilised peptides, 
staiidard conditions of peptide synthesis or using reverse proteolysis whereby certain 
proteases such as carboxypeptidase Y and lysyl endopeptidase. Included in the fifth 
aspect is also a method for fiuther fractionating proteins or peptides by adding, usually 
15 at the N terminus, amino acids which form part of the recognition sequence of a 

protease which specifically cleaves at a recognition sequence of two or amino acids 
whereby one or more terminal amino acids in the protease recognition site is provided 
by the starting protein or peptide. In this manner, only a fraction of the proteins or 
peptides to which the new amino acids are added will be then subject to terminal 
20 protease cleavage by virtue of the newly created sequence. In this manner, proteins or 
peptides can be tagged with additional amino acids usually at the N terminus creating, 
in a fraction of the thus tagged mixture, a specific protease cleavage site. The proteins 
or peptides can then, for example, be immobiUsed via the new terminus for example 
using a tagged terminal amino acid or by adding a chemical tag to the terminus, 
25 whereby an affinity reagent is then used to immobilise the tagged moieties. After 

removing non-immobilised untagged molecules, the proteins or peptides can then be 
subjected to cleavage with the specific protease which will then only cleave where the 
cleavage site has been generated by a combination of synthesis-derived amino acids 
and the original protein or peptide-derived amino acids. The cleaved peptides can then 
30 be fractionated using protein affinity reagents and mass analysed (or further processed 
prior to mass analysis) thus representing a subset of the peptide mixture. By using 
parallel synthesis of specific amino acids to exposed termini followed by 
immobilisation and cleavage, large mixtures of proteins or peptides can be fractionated 
on the basis of their terminal amino acid(s). An example of a protease recognition site 
35 is ile, glu, gly, arg which is cleaved between gly and arg by Factor Xa. The sequence 
ile, glu, gly could be synthesised onto the N terminus of a protein or peptide and thus 
if the adjacent amino acid in the protein or peptide sequence were arg, the cleavage 
site would be created and could be cleaved by Factor Xa. Other examples of protease 
cleavage sites are asp, asp, asp, asp, lys, cleaved by Enterokinase between asp and lys; 
40 pro, gly, ala, ala, his, tyr cleaved between his and tyr by genease I; leu, val, pro, arg, 
gly, ser cleaved between arg and gly by thrombin. N terminal addition of partial 
sequence asp, asp, asp, asp could be used to identify proteins or peptides with N 
terminal lys (cleaved by enterokinase), pro, gly, ala, ala, his to identify 
proteins/peptides with N terminal tyr (cleaved by genease), leu, val, pro, arg to 
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identify N terminal gly, ser; or leu, val, pro, arg, gly to identify N tenninal ser (cleaved 
by thrombin). Other proteases such as the MMP's (matrix metalloproteinases) with 
specific recognition sites could be used to fractionate proteins with other N terminal 
amino acids. Different protease recognition sites could thus be used in combination 
5 with the proteases to fractionate proteins or peptides according to the N terminal 
amino acid. As an altemative, one or more amino acids are added to the free N 
terminus of a peptide could be used to create a site for binding by an affinity reagent 
including where such a site is dependant on one or more the N terminal amino acids 
from the peptide. Thus, different peptide or groups of peptides could be distinguished 

10 by the addition of amino acids to the N terminus which creates, in a manner dependant 
on the N terminal amino acids, a site for protease digestion or a site for binding by an 
affinity reagent. Where proteins are used as the starting material especially from 
mammalian cells whereby the N terminal protein is methionine, this can be removed if 
required by, for example, formylation and cleavage by a bacterial protease specific for 

15 removal of terminal formylmediionine. 

Protein affinity reagents are an important aspect of the fifth aspect of the present 
invention and can be used for both broad fractionation of groups of proteins/peptides 
or for specific fractionation of individual proteins/peptides. For fractionation, it is first 

20 necessary to prepare fractions of or individual protein affinity reagents which binds to 
a specific fraction or specific peptide and not to other fi^ctions/peptides. A convenient 
method is to fractionate the proteins or peptides prior to isolation of the protein affinity 
reagents. In the case of antibodies as the protein aflBnity reagents, such 
proteins/peptides can then be used either to bind displayed antibodies from a Ubrary or 

25 can be used to immunise animals for generation of antisera. Where a Ubrary of 
recombinant antibody binding sites such as single-chain Fv's is used, gene clones 
encoding these can be retrieved after binding to protein/peptide fractions providing a 
replicable source of the affinity reagents for subsequent isolation of the specific 
protein/peptide fraction. Individual single-chain Fv's may, in parallel, be screened for 

30 binding specificity, for example by analysing peptide binding by MALDI-ToF. In this 
case, single-chain Fv's which bind to a single peptide from a large protem mixture are 
retamed (in practice, those bindmg up to three peptides are also retained) as gene 
clones for subsequent individual use or use within a mixture of Fv's for isolation of a 
protein/peptide fraction from the mixture. It will be appreciated that free N termini 

3 5 from proteins are often good targets for isolation of very specific antibodies and 

therefore capture and release of N terminal peptides from a protein will particularly 
favour subsequent antibody isolation. Certain Fv's may be useful for the elimination 
of abundant proteins or peptides from the mixture. It will be appreciated that retention 
and characterisation of the binding of single-chain Fv's may also provide a means to 

40 reduce redundancy by eUminating Fv's with the same specificity as other Fv's. 

The various embodiments of the fifth aspect of the present invention cover 
combinations of protein digestion/cleavage, fractionation with protein affinity reagents 
and mass analysis with an optional step of fractionation using affinity tags for specific 
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sequences or structures in the proteins or peptides, and an optional step of chemical 
tagging with fractionation by virtue of these tags. The different aspects encompass 
different sequences of these steps as follows; 

5 1 - repeated digestion/cleavage cycles and mass analysis 

2 - digestion/cleavage, fractionation with protein affinity reagents, mass analysis 

3 - fractionation with protein affinity reagents, digestion/cleavage, mass analysis 

4- terniinal chemical tagging, digestion/cleavage, fractionation with affinity reagents, 
mass analysis 

10 5 - as 3 but with additional cycle(s) of tagging, digestion/cleavage, fractionation 
5 - as 4 but with repeated tagging, digestion/cleavage cycles and mass analysis 

The fifth aspect of the present invention should be considered to encompass these and 
related protein/peptide processing stqjs with the core objective of reducing the 
1 5 complexity of protein mixtures in order to achieve mass analysis of the resultant 
protein/peptide fractions. 

The currently common method for operation of the invention involves tagging the N 
and/or C terminus of a mixture of proteins (either natural or encoded by cDNA 

20 libraries), cleaving vn\h a protease, immobilising the N and/or C terminal peptide 

fragments, and releasing and subjecting the peptides to mass analysis. Alternatively, 
the N or C termini may be modified by addition of amino acids prior to cleavage with 
a sequence-specific protease. Prior to mass analysis, the peptides are used to bind 
protein affinity reagents such as antibodies whereby these antibodies have been pre- 

25 selected to fractionate the peptides or are themselves retained as affinity reagents. The 
mixture of proteins may be pre-fractionated, for example by size, or may be produced 
from cDNA libraries which are pre-fractionated by segregation of clones. The 
retained protein affinity reagents are then used to analyse complex samples of proteins 
whereby the antibodies are used to bind peptides which are then mass analysed 

30 

It will be appreciated that many of the same principles described herein for the 
digestion/cleavage, firactionation and mass analysis of proteins can also be applied to 
other polymeric molecules such as DNA or RNA. In the case of DNA or RNA, free 
phosphate and hydroxyl groups at the 5' and 3' termini respectively provide a means 

35 for very specific addition of chemical tags or direct binding to a sohd phase. Sequence 
specific restriction or modification enzymes provide for cleavage or modification of 
DNA molecules. Usefiil affinity reagents for DNA or RNA are nucleic acids 
themselves which can be specifically hybridised to a complimentary DNA or RNA 
sequence with attachment to a solid phase either before of after hybridisation. Using 

40 such methods, complex mixtures of nucleic acids can be fractionated and then 
subjected to mass analysis especially using mass spectrometry. 

The invention is illustrated by the following examples which some not be considering 
as limiting in scope; 
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Example 1 

The experiments described in the present example were conducted using a pair of modified 
5 single chain antibody (scAbs) genes. Two modified scAbs were prepared consisting of N- 
terminal epitope tags, the heavy chain variable region (VH), a 14 amino acid linker 
(EGKSSGSGSESKVD), the Hght chain variable region (VL) each fused to the b-zip domain 
from either the c-jun or c-fos genes. 

10 These constructs were cloned into the vector pET 5c (Rosenberg AH et al.. Gene, 56:125-135, 
1 987) which provides a T7 promoter followed by the ribosome binding site from T7 gene 1 0. 
The sc Ab constructs were inserted into the vector at an Ndel site such that the sequence 
encoding the epitope tag followed the first ATG of T7 genelO. The first construct consisted of 
a scAb against Pseudomonas aeruginosa (Molloy P. et al. Journal of Applied Bacteriology, 78: 

15 359-365, 1 995) with the FLAG epitope (MDYKDDDK) (Knappik A and Pluckthun A, 

BioTechniques, 17: 754-761 , 1994) added at the N terminus, and the b-zip domain of c-fos 
(Abate, C. et al Proc. Natl. Acad. Sci. USA. 87: 1032-1036, 1990) at the C-terminal region of 
the protein. The second consisted a scAb constructed from the anti-foetal antigen antibody 
340 (Durrant LG et al. Prenatal Diagnosis, 14: 13 1-140, 1994) with a poIy-Histidine tag at the 

20 N terminus, and the b-zip domain of c-jun (Abate C. et al, ibid) at the C-terminal region of the 
protein. 

The aitii-Pseudomonas aeruginosa (a-Ps-fos) scAb and the 340-jun scAb were constructed as 
described below: 

25 

DNA for the a-Ps scAb in the vector pPMlHis (Molloy P et al., ibid) was amplified with the 
primers RD 5' FLAG; 5'gcggatcccatatggactacaaagacgatgacgacaaacaggtgcagctgcag3' 
(Genosys Biotechnologies Europe Ltd, Cambridge, UK) and RD 3': 
5'gcgaattcgtggtggtggtggtggtgtgactctcc3' (Genosys) which introduced the 5' FLAG epitope 

30 sequence and removed the 3' stop codon respectively. The reaction mixture included 0.1 jig 
template DNA,. 2.6 units of Expand™ High Fidelity PGR enzyme mix. (Boehringer Mannheim, 
Lewes, UK.), Expand HF buffer (Boehringer Mannheim), 1 .5 mM MgClz, 200 |iM 
deoxynucleotide triphosphates (dNTPs) (Life Technologies, Paisley, UK) and 25 pmoles of 
each primer. Cycles were 96°C 5 minutes, followed by [95°C 1 minute, 50°C 1 minute, 72°C 

35 1 minute] times 5, [95°C 45 seconds, 50°C 1 minute, 72°C 1 minute 30 seconds] times 8, 
[95°C 45 seconds, 50°C 1 minute, 72''C 2 minutes] times 5, finishing with 72''C 5 minutes. 
The 1 123 bp product obtained was cut with BamHI and EcoRI and cloned into the vector 
pUC19 (Boehringer Mannheim). The DNA sequence was confirmed, using the Thermo 
Sequenase radiolabeled terminator cycle sequencing kit with [^^P] dideoxy nucleotides 

40 (Amersham Life Science, Amersham, UK). The construct was cloned into pET5c vector 

(Promega UK Ltd, Southampton, UK.) as a Ndel to EcoRI fragment (see Molecular Cloning, 
A Laboratory Manual eds. Sambrook J, fiitsch EF, Maniatis T. Cold Spring Harbor 
Laboratory Press 1 989, New York, USA). Plasmid DNA was prepared using Wizard® Plus 
SV Minipreps DNA purification System (Promega UK Ltd), or for larger scale, Qiagen 

45 Plasmid Midi Kit (Qiagen Ltd, Crawley, UK.). The new plasmid generated was named pET5c 
FLAG-aPs scAb. 
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The fos cassette was assembled by PGR of overlapping oligonucleotides: 
Fos 1 for 5 ' -atggaattcctcgagaccgacaccctacaggcggaaaccgaccagctgga 
FosSOrev 5 '-tcgcgatttcggtttgcagcgcggatttttcgtcttccagctggtcggtt 
Fos 71 for 5'-aaaccgaaatcgcgaacctgctgaaagaaaaagaaaagctggagttcatc 
5 Fos 1 5 5rev 5 ' ggaagcttgaattccgccggacggtgtgccgccaggatgaactccagctt 

The above oligonucleotides were included in a reaction mix at Ipmol each, and the reaction 
was driven using lOpmol primers FoslfS; 5'-atggaattcctcgagacc and Fos 155rS 5'- 
ggaagcttgaattccgcc using high fidelity polymerase and reaction components as previously. 
1 0 The resulting 155bp product was digested with EcoRI, purified and cloned into EcoRI cut 
pUC L9 for sequence analysis using standard procedures (see Molecular Cloning. A 
Laboratory Manual ibid). The Fos cassette was sub-cloned into the pET5c FLAG-ctPs scAb 
plasmid as an XhoI-EcoRI fiagment by substitution of the existing 320bp XhoI-EcoRI 
fragment carrying the human constant region domain. 

15 

The 340 scAb was produced by substitution the VH and VK of the 340 antibody in place of 
the a-Ps VH and VK in ppMlHis. The 340 VH was amplified with the primers 
5 'cagctgcaggagtctgggggaggcttag3 ' (Genosys) and 5'tcagtagacggtgaccgaggttccttgaccccagta3 ' 
(Genosys), The reaction mixture included 0. 1 ^ig template DNA, 2.6 units of Expand™ High 

20 Fidelity PGR enzyme mix. Expand HF buffer, 1.5 mM MgG12, 200 |.iM dNTPs and 25 pmoles 
of each primer. Gycles were 96°G 5 minutes, followed by [95 °G 1 minute, SO^C 1 minute, 
72°G 1 minute] times 5, [95''G 45 seconds, 50°G 1 minute, 72°G 1 minute 30 seconds] times 
8, [95°G 45 seconds, 50°C 1 minute, 72°G 2 minutes] times 5, finishing with 72''G 5 minutes. 
The 357 bp product was cut vnth PstI and BstEII and cloned into PstI and BstEII cut pPMlHis 

25 (see Molecular Cloning. A Laboratory Manual, ibid). Similarly, the 340 VK was amplified 
with the primers 5'gtgacattgagcteacacagtctcct3 ' and 5'cagcccgttttatctcgagcttggtccg3' 
(Genosys). The 339 bp product was cut with SstI and Xhol and cloned into SstI and Xhol cut 
modified pPMlHis (produced above). The DNA sequence was confirmed, using the Thermo 
Sequenase radiolabeled terminator cycle sequencing kit with [^^P] dideojcy nucleotides as 

30 before. DNA for the 340 scAb in the vector pPMlHis was amplified with the primers RD 5 ' 
mS: 5'gcggatcccatatgcaccatcatcaccatcaccaggtgcagctgcag3' (Genosys) and RD 3' (given 
above) which introduced the 6 histidine residues at the 5' end and removed the 3 ' stop codon 
respectively. Reagents and conditions for amplification were exactly as for the a-Ps construct. 
The 1 1 14 bp product obtained was cut with BamHI and EcoRI and cloned into the vector 

35 p\JCl9 {see Molecular Cloning. A Laboratory Manual, ihid). The DNA sequence was 
confirmed as before and the construct was cloned into pET5c vector as a Ndel to EcoRI 
fragment to generate the plasmid pEt5c HIS 340 scAb, 

The jun cassette was assembled by PGR of overlapping oligonucleotides: 
40 Jun 1 for 5'-atgagaattctcgagcgtatcgctcgtctggaagaaaaagttaaaaccct 

Jun 85rev 5'-tagcggtggaagccagttcggagttctgagctttcagggttttaactttt 
Jun 7 1 for 5 ' -tggcttccaccgctaacatgctgcgtgaacaggttgctcagctgaaacag 
Jun I46rev 5'-catgcgaattcgtggttcataactttctgtttcagctgagcaacc 

45 The above oligonucleotides were included in a reaction mix at Ipmol each, and the reaction 
was driven using lOpmol primers Jim Ifor-S; 5'-atgagaattctcgagcg and Junl46rev-S; 5'- 
catgcgaattcgtggttc using high fidelity polymerase and reaction components as previously. The 
resulting 146bp product was digested with EcoRI, purified and cloned into EcoRI cut pUG19 
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for sequence analysis using standard procedures (see Molecular Cloning, A Laboratory 
Manual ibid)^ The Jun cassette was sub-cloned into the pEt5c HIS 340 scAb plasmid as an 
XhoI-EcoRI fiagment by substitution of the existing 320bp XhoI-EcoRI fragment carrying the 
human constant region domain 

5 

Plasmids his-340-jun and FLAG-aPs-fos, were used as templates for PGR using biotinylated 
primer BioT7; 5'-agatctcgatcccgcaaatta and primer petrev;-5'-aaataggcgtatcacgaggcc. Primers 
were supplied by GenoSys (Cambridge, UK) and used in the reaction at a concentration of 
Ipmol. Components and PCR conditions were as previously. The his-340-jun reaction 

1 0 product was 992bp, and the FLAG-aPs-fos reaction product was 1 002bp. The products were 
purified using a spin purification cartridge (Qiagen, Crawley, UK) and diluted to 100ng/^I 
concentration. Quantitation was by UV absorbance at 260nm. 500ng biotin labelled DNA 
was reacted with lOfxl streptavidin coated magnetic particles (Bangs labs. Fishers, USA). The 
reaction was conducted in a siliconised microcentrifuge tube in a volume of 500^1 PBS 1% 

15 (w/v) BSA for 10 minutes at room temperature. Following binding, the particles were 

collected by magnet (Dynal, Bromborough, UK) and washed three times using PBS 1% BSA. 

Following the final wash, in vitro translation reaction was initiated by addition of 25^1 T7 
Quick coupled transcription translation mix (EYomega, Southampton, UK) supplemented with 
20 biotinyl lysine tRNA CE*romega). The translation reaction was conducted at 30°C for 60 

minutes then placed on ice. Particles were collected by magnet, and washed using ice cold 
PBS containing 1% BSA. 

In some experiments, non'magnetic steptavidin particles were used in IVTT reactions (Bangs 
25 Labs, Fishers, USA). In such cases particles were recovered during wash cycles by 
centrifugation. 

In some experiments, coloured streptavidin particles, magnetic and non magnetic (Bangs 
Labs), were used in IVTT reactions. 

30 

In some experiments translation products bound to the particles were detected using antibodies 
for either the Flag or the his6 epitope engineered into each of the model gene constructs. 
Antibodies were added to the washed particles diluted in PBS. Incubations were for 60 
minutes at 4"C with gentle mixing. A secondary reagent (anti-mouse-HRP conjugate) was 
35 added at the recommended dilution in PBS and incubated for a further 30 minutes at 4°C. 
Particles wersi washed three times using 200fil PBS before colour development with the 
chromogenic substrate. Reactions were read at 492nm. 

Protein-protein binding reactions were conducted using IVTT proteins bound to the particle 
40 surface. In such experiments, non magnetic streptavidin particles were "captured" by protein 

mediated (fosrjun) binding to the surface of magnetic particles. Magnetic particles with fos 

IVTT product were mixed gently with non magnetic particles with jun bound on the surface. 

The reaction was conducted in lOOfil PBS, BSA and allowed to proceed at room temperature 

for 30 minutes. In a negative control reaction, non-magnetic particles with a Sea protein (a-Ps 
45 sc Ab; MoUoy P et al., ibid) bound on the surface were mixed with the magnetic particles 

coated with the fos IVTT product. Following incubation, the particles were captured by 

magnet and washed six times using PBS, 1% BSA. 
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The presence of the captured target protein gene was confirmed iising PGR andDNA 
sequencing. For detecting the jun model gene, jun specific primers Jun Ifor-S and Junl46rev- 
were used in a PGR assay. The assay was initiated by addition of 10% (v/v) particles directly 
into the PGR mix. Gomponents and reaction conditions were as previously. The 146bp jun 
5 specific product was detected by gel electrophoresis. For detecting the fos model gene, 

primers FoslfS and Fos 155rS were used in a PGR assay. Reaction conditions and detection 
of the 155bp fos specific product were as above. For detecting the negative control protein, 
primers Seqlscab 5'agatccctactataggta and Seq2scab; 5'-ggtgagctcgatgtatcc were used to 
detect a 1 15bp product in the a-Ps scAb protein gene 

10 

In the above experiments, Jun PGR products were detected following capture by fos magnetic 
particle under conditions were no a-Ps scAb PGR products could be detected foUoAving 
interaction with the fos magnetic particles. 

15 Example 2 

In this example a single-chain antibody library was produced including unique peptide 
"barcodes". Human peripheral blood lymphocyte RNA was prepared according to 
standard procedures. Briefly, lymphocytes were prepared firom 1 0ml heparinised 

20 blood taken from 16 normal healthy donors. Lymphocytes were collected following a 
density gradient centrifiigation procedure using Lymphoprep medium (Sigma, Poole, 
UK). RNA was prepared using the QuickPrep system and instructions provided by the 
supplier (Pharmacia, St Albans, UK). Synthesis of cDNA was conducted using a 
cDNA synthesis kit (Pharmacia, St Albans, UK) and random hexamer primers with 

25 conditions recommended by the supplier. Immunoglobulin heavy chain variable 
region (Vh) and light chain variable regions (VI) were amplified from cDNA in 
separate PGR mixes using primer sets designed to maximise Vh and VI repertoires. 
Primer sets were as described previously (Marks J.D. et al 1991, Eur. J. Immunol. 21: 
985). Vh and VI PGR reactions were conducted using, 2.6 units of Expand™ High 

30 Fidelity PGR enzyme mix (Boehringer Mannheim, Lewes, UK.), Expand HF buffer 
(Boehringer), 1.5 mM MgClz, 200 fiM deoxynucleotide triphosphates (dNTPs) (Life 
Technologies, Paisley, UK) and 25 pmoles of each primer pool. Cycles were 96°C 5 
minutes, followed by [95°C 1 minute, 50°C 1 minute, 72°G 1 minute] times 5. [95°G 
45 seconds, 50°G 1 minute, 72''C I minute 30 seconds] times 8, [95°G 45 seconds, 50° 

35 G 1 minute, 72''G 2 minutes] times 5, finishing with 72°G 5 minutes. 

In a separate PGR, a linker fragment of form (Gly4Ser)3 (Huston J.S . et al 1 988, PNAS, 
85: 5879-5883) was amplified from a cloned template pSWl-ScFvDl .3 (McGafferty 
et al, 1990, A'ii/Mre 348: 522-554) using primers sets detailed previously (Marks, J. D 
40 in Antibody Engineering, ed Borrebaek G.A.K New York O.U.P., 1 995). The 93bp 
linker Augment product was annealed together with an equimolar mixture of the Vh 
and VI PGR products. The mixture was further amplified in a "pull through" reaction 
using flanking primers HuVHBAGKsfi and HuFORNot as detailed in Vaughan et al 
(V aughan T.J. et al 1 996, Nature Biotech. 14: 309-3 14). All fragments used in the 
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pull-through reaction were purified fi-ee of their initial primers prior to mclusion in the 
reaction. Purification was conducted using the Wizard PGR Preps system fi-om 
Promega (Promega, Southampton UK). 

5 The assembled contig of forai Vh-linker- VI, was digested with restriction enzymes 
Sfil and NotI(Boehringer) using standard conditions and purified as above. The 
purified fi-agment was annealed with a double stranded synthetic oligonucleotide 
adapter mix designed to introduce a V8 protease cleavage site juxtaposed with a tract 
of randomised sequence in firame with the C-terminus of the VI gene. This V8/unique 

1 0 sequence barcode was produced by annealing a pair of synthetic oUgonucleotide pools 
of fonn 5'-ggccgcgaggaagaggaa[(atg)/(can)/(agn)/(aan)/(gan)/(ttn)]2gc-3' and 5'- 
ggccgc[(naa)/{ntc)/(ngt)/(nct)/(nag)/(cat)]2Ctccttctcctcgc-3 This linker has NotI 
compatible ends (underlined) and therefore facilitates the insertion of the complete 
single chain antibody-V8/unique sequence barcode firagment into Sfil^Notl prepared 

15 pCANTAB 5 (Pharmacia) phagemid vector. 

The unique sequence barcode was designed to avoid the introduction of stop codons 
and fiirther biased to exclude encoding residues with greater than two alternative 
codons. By this strategy, the number of specific oligonucleotides required to identify 
20 a given de-coded peptide sequence, is minimised. In all, the unique sequence barcode 
is able to encode 11 of the 20 amino-acids. In addition to the V8 peptidase cleavage 
site (a string of 4 glutamic acid residues), the sequence barcode is 12 codons long. 
Thus fi-om the repertoire of 1 1 amino acids (10 of which are encoded by either of two 
codons), is able to encode 1 1 '^/2 = ~1 .5xlO'^ different peptides. 

25 

The assembled scfv fiagment (Vh-linker- VI) with Sfil and NotI prepared ends was 
annealed and ligated to the NotI sequence-barcode adapter and re-purified. For 
experiments expressing the human scfv library by phage display, the complete 
fragment was Ugated into Sfil-NotI prepared pCANTAB 5 (Pharmacia) phagemid 
30 vector, and transformed into competent TGI E.coli. 

For other experiments using in vitro transcription and translation (TVTT), the 
assembled scfv Ubrary was subcloned into Sfil Nofl prepared pCANTAB5-T7. This 
vector is the same as the commercially available pCANTAB5 except it was modified 

35 to include the T7 promoter sequence (ttaatacgactcactata) inserted at the HindlH site at 
position 2235. The modification was achieved by ligation of a double-sti-anded 
synthetic DNA linker of sequence 5'- agctaatacgactcactata into HindHI cut and de- 
phosphorylated pCANTAB5. Recombinant clones containing the T7 promoter were 
selected using a diagnostic PGR. 

40 ' 

Following ligation and tiransforaiation into competent TGI E.coli, cells were grown 
for 1 hour in 1ml of SOG medimn and then plated onto TYE medium with lOOug/ml 
ampicillin. Colonies were scraped off plates into 5ml of 2x TY broth containing 
ampicillin. The cultured library was used to prepare DNA for IVTT reactions. 
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The pCANTAB5-T7 Scfv library DNA was used in an in vitro translation reaction. 
The IVTT was conducted using the T7 Quick coupled transcription translation mix 
(Promega, Southampton, UK) and 1 Ojxg of the pC ANTAB5-T7 Scfv library DNA in a 
5 total volume of 50ixl. The translation reaction was conducted at 30°C for 90 minutes 
then placed on ice. In some experiments reactions were monitored for the presence of 
translation products using ^^S-methionine incorporation assays. Reactions were stored 
at — TO'^C prior to use in binding and screening assays. 

1 0 The single-chain antibody library was used to in a binding reaction to recombinant 

human p53 protein (Oncogene Research Products-Calbiochem, Nottingham, UK). The 
rVTT mix was diluted xlO fold in PBS and used in a binding assay to human 
recombinant p53 protein immobilised in a 96-well microplate. The p53 protein was 
immobilised by overnight incubation at a concentration of 100|ig/ml in phosphate 

1 5 buffer at 4°C. The plate was washed using PBS 0.5% (w/v) BS A and the diluted IVTT 
mix added to the test and control wells for binding. The binding reaction was 
conducted at 3TC for 90 minutes. The plate was washed x3 using PBS-T (PBS + 
0.05% v/y tween-20) and subjected to V8 protease digestion (Takara, Wokingham, 
UK). Protein fragments were collected from the supernatant and size fractionated to 

20 exclude the V8 protease and other large species before analysis by MALDI-tof. 

MALDI-tof fragment analysis identified a number of peptide fragments. The peptide 
sequences were used to design a set of corresponding synthetic ohgonucleotides. The 
oligonucleotides were used in a PGR based screen of the single chain Ubrary. Pfu 

25 turbo (Stratagene Europe) DNA polymerase was used to synthesise complementary 
strands in members of the human single-chain antibody library DNA. Following 15 
roimds of thermal cycling, the product was subjected to Dpnl digestion. This step 
depleted the mixture of parental plasmid molecules to ensure that only the newly 
synthesised primed products were propagated. Ijal of the reaction was transformed 

30 into TGI competent cells and plated onto LB plates containing 100|ag/ml ampiciilin. 
Individual clones were picked, expanded and DNA prepared according to standard 
procedures. The DNA was used directly in a second round of screening involving 
IVTT, antigen binding, V8 protease digestion, MALDI-tof fragment analysis. After 2 
rovmds of selection 6 scFv's were isolated which bound recombinant p53 . 

35 

Example 3 

The experiments described in the present example were conducted using an Fab 
expression vector pC5A8-03, the construction of which is as follows. The vector 
40 pC5A8-01 is based on the vector pLITMUS28 (New England Biolabs, MA. USA) 
which provides an inducible lac promoter and a Ml 3 origin of replication. The Fab 
region of the antibody was assembled from two DNA fragments encoding the variable 
region (VH) and first constant region (CHI) of the heavy chain and the variable region 
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(VK) and constant region (CK) of the kappa light chain of a humanised monoclonal 
antibody 5A8 directed against CD4 (Reimami KA. et al.. Aids research and human 
retroviruses 13, 11: p933, 1997). These fragments were fused to the pelB leader 
sequence (Lei S-P. et al. Journal of Bacteriology, 169: 9: 4379-4383, 1987) and 
5 inserted between the BglR and Bst9Sl restriction sites of pLITMUSZS as described 
below. All following molecular biology procedures will be famihar to those skilled in 
the art and can be found in Molecular Cloning, A Laboratory Manual eds. Sambrook 
J., Fritsch EF. and Maniatis T. Cold Spring Harbor Laboratory Press 1989, New York, 
USA. All oligonucleotides were synthesised by Genosys Biotechnologies Europe Ltd., 
10 Cambridge, UK. Unless otherwise stated, all restriction endonucleases were purchased 
from Life Technologies, Paisley, UK). All polymerase chain reactions were carried out 
using pfii DNA polymerase (Promega, Southampton, UK). 

In order to assemble the light chain fragment, the pel B leader sequence was amphfied 

15 using the polymerase chain reaction (PCR), using a Hybaid Touchdown Thermal 
Cycler, from clone pPMl-HIS which contains a single-chain antibody fragment 
(scAb) against Pseudomonas aeruginosa (Molloy, P. et al. Journal of AppUed 
Bacteriology, 78: 359-365, 1995). This initial reaction was carried out using 
oUgonucleotides OLOOl which encodes a BglH restriction site, the N terminal residues 

20 of the pelB leader sequence and the Shine Dalgamo sequence and OL002 which 
encodes the C-terminus of the pel B leader and the N-teiminal residues of a kappa 
hght chain from pDIVKV3 (ref). The product of this reaction was purified from 
NuSieve GTG agarose (Flowgen, Lichfield, UK) using a Wizard® PCR purification 
kit (Promega UK Ltd., Southampton, UK) denatured and used, in conjunction with 

25 OL004 which encodes the junction of the variable and constant regions of the kappa 
hght chain, to amphfy the variable region of the kappa chain from clone pDIVKV3, by 
PCR using standard protocols. The constant region of the kappa hght chain was 
amphfied, by PCR, from clone pPMl-HIS (Molloy et al.) uskig OL003 which encodes 
the C-terminal residues of the variable region and the N-terminal residues of the 

30 constant region and OL005, which encodes the C-tenninal residues of the constant 
regions of the kappa hght chain and the restriction enzyme site EcoRl . These two 
fragments were subsequently amphfied by overlap PCR using OLOOl and OL005, 
digested with BglU. and £coRI and cloned into pLrrMUS28 in order to produce 
pC5A8-01. 

35 ^ 

The heavy cham was assembled by amphfication of the pel B leader sequence from 
the assembled light chain using OL006, which encodes an £'coRI site and the Shine 
Dalgamo sequence and OL007, which encodes the C-terminal residues of the pel B 
leader sequence and the N-terminal residues of a heavy chain from pDIVHV4. The 

40 product of this reaction was used, alongside OL009, which encodes the junction of the 
variable and constant regions of the heavy chain, to amplify the variable region of the 
IgGl heavy chain from clone pDIVHV4. Exon 1 of the heavy chain constant region 
was amplified, by PCR from clone pSVgptHuIgGl using OL008 and OLOlO, which 
encode the C-terminal residues of the variable regions of the heavy chain and the ii- 
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terminus of the constant chain (OL008) and the C-tenninal residues of exon 1 and the 
restriction site for Sstl (OLOlO). The products of these reactions were amplified by 
overlap PGR using OL006 and OLOlO, digested with EcoRI and SStl and cloned into 
pLITMUS28 containing the light chain fragments in order to produce pC5A8-02. 

The C-terminal residues of CHI and a C-terminal FLAG tag sequence (DYKDDDDK) 
(Knappik A. and Pluckthun A. Biotechniques, 17; 754-761, 1990) were added using 
OLOl 1 and OL012 which included the restriction sites EcolCRl and Bst9U in order to 
produce pC5A8-03. Alternatively, these tags could include the 6HIS tag or MS tags 
10 (see example). 

The oligonucleotides utilised in the production of pC5A8-0, pC5A8-02 and pC5A8-03 
are listed below; 

1 5 OLOOl ; 5- GGGCAGATCnTAACnTAAGAAGGAGATATACATATGAAATACCTATTGCCTACGG 3' 

OL002; 5- GGGTCTGGGTCATAACGATATCGGCCATCGCTGGTTGGGCAGC 3- 

OL003 ; 5* GGTACCAAACTGGAGATCAAACGGACTGTGGCTGCACCATCT 3 ' 

OL004; 5" AGATGGTGCAGCCACAGTCCGTTTGATCTCCAGTTTGGTACC 3' 

OL005; 5' GATCGAATTCCTAACACTCTCCGCGGTTGAAGCTCTTTG 3' 
20 OL006; 5' GATCGAATTCTAACTTTAAGAAGGAGATATACATATG 3' 

OL007; 5' GGACTGAACCAGTTGGACTTCGGCCATCGCTGGTTGGGCAGC 3- 

OL008; 5- ACCCTGGTTACCGTCTCCTCAGCCTCCACCAAGGGCCCATC 3- 

OL009; 5' GATGGGCCCTTGGTGGAGGCTOAGGAGACGGTAACCAGGGTAC 3' 

OLOlO; 5' GATCGAGCTCTGCnTCTTGTCCACCTTGGTGrrGC 3' 
25 OLOl I ; 5" CCCAAATCTTGCGCTGCAGACTACAAAGACGACGACGACAAATAGCTCGAGC 3' 

OWU; 5* TTAAGCTCGAGCTATTTGTCGTCGTCGTCTTTGTAGTCTGCAGCGCAAGATTTGGG 3' 

The production of functional Fab was demonstrated by ELIS A. In summary, the above 
vector was transferred into E.coli strain DH5a and grown at 37°C in the presence of 

30 100 jAg/ml ampicillin and 1% glucose until an ODeoo of 0.5 was attained. Protein 
production was induced by the addition of ImM isopropylthio-p-D-galactoside (IPTG) 
in the absence of glucose. The periplasmic fraction was released by osmotic shock 
using 30mM Tris HCl, 20% sucrose pHS.O, ImM EDTA followed by 5mM MgS04 
(MoUoy, P. et al. Journal of AppUed Bacteriology, 78: 359-365, 1995) and added 

35 directly to an Immulon 4 ELISA plate (Dynex,) which had previously been coated 
overnight with soluble human CD4 (Intracel Corp., Issaquah, WA) at a concentration 
of l(ig/ml in phosphate buffered saline (PBS) pH7.4, at room temperature in a 
humidified chamber. Alternatively, the periplasmic fraction could be released by cell 
lysis or by the addition of ImM EDTA. Non specific binding was reduced by 

40 incubating the plate for 1 hour at room temperature with PBS containing 0.05% Tween 
20, 2% bovine serum albumin (BSA) and 0.05% thimerosal (Sigma) prior to addition 
of the soluble Fab. The anti-CD4 specific Fab was detected using goat anti-human 
IgG Fab specific Horseradish peroxidase conjugate (Sigma, UK) which was itself 
detected using 5,5' tetramethylbenzidine dihydrochloride (TMB)(Sigma, UK) and 

45 hydrogen peroxide in phosphate/citrate buffer pH5.0. Colour development was 
stopped after 30 minutes using 0.2N H2SO4 and the absorbance monitored at 450 nm. 
Alternatively ABTS/citrate (Sigma, UK) could be used for detection. 
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In order to produce a library of CDR3 sequences, unique restriction sites were introduced into 
vector pC5A8-03 by oligonucleotide-directed mutagenesis (Kunkel TA. Proc. Natl. Acad. Sci. 
USA: 488^92 (1985) and Current Protocols in Molecular Biology eds . Ausubel FM, Brent 
5 R., Kingston RE., Moore DD., Seidman JG., Smith JA., Struhl K. John Wiley & Sons, Inc.) 
using the oligonucleotides listed below. The presence of the AatH and Hindm (5' and 3' to 
LCDR3) and Bssim and SanDl (5' and 3' to HCDR3) restriction sites in the kappa light 
chain and the heavy chain respectively, were confirmed by digestion with the appropriate 
restriction enzymes. These plasmids, each containing an additional restriction site were 
10 designated pC5A8-04 to pC5A8-07. 



OL013; 5' GAAGACGTCGCTGTTTAC 3' 
OL014; 5' GGTACCAAGCTTGAGATC 3' 
15 OL015;5'CTACTGCGCGCGTGAAAAAG3' 
OL016; 5' GGGTCAGGGGACCCTGG 3' 

Following digestion of pC5A8-07 with AatR and Hindm, the highly variable residues in 
CDR3 of the kappa light chain variable regions were randomised using a mixture of 

20 degenerate oligonucleotides carrying the anchor residues (aa 83-88 and aa 97-103) and an 10 
nucleotide palindromic sequence at their 3' end which encompasses the restriction 
endonuclease site for Hindm. These oligonucleotides hybridise at their 3' ends and then act as 
a substete for DNA polymerase resulting in the production of double-stranded homoduplex, 
which is digested with the two resbiction enzymes and cloned into the digested vector using 

25 standard protocols (see Current Protocols in Molecular Biology eds . Ausubel FM., Brent R., 
Kingston RE., Moore DD., Seidman JG., Smith JA., Struhl K. John Wiley & Sons, Inc.) The 
oligonucleotides were prepared such that residues 91, 92, 93, 94, 95, 95A, 95B and 96 were 
randomised by the inclusion of equal concentrations of each nucleotide at each step of the 
oligonucleotide synthesis (Genosys, Cambridge, UK). 

30 

The sequence of the mutagenic oligonucleotides is based on a CDR3 length of 10 
residues. Residues 89 and 90 are relatively conserved and are therefore fixed in this 
example. The residues to be randomised are shown in italics. Additional libraries with 
a CDR3 of 6,7,8 or 9 residues can also be created by varying the length of the 
35 randomised region. 

Positive strand; 5' 

GAAGACGTCGCTGTTTACTACTGCCAGCAGNNSNNSNNSNNSNNSNNSNNSAC 
CTTCGGTGGTGGTACCAAGCTTGG 3' 

40 

Negative stand: 5' 

CCAAGCTTGGTACCACCACCGAAGGTSNNSNNSNNSNNSNNSNNSNNCTGCTG 
GCAGTAGTAAACAGCGACGTCTTC 3' 

45 CDR3 of the heavy chain was randomised using the restriction endonuclease sites 
^jjHII and SanDl and the mutagenic oligonucleotides listed below, in a similar 
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manner to that described in the previous section. In this case residues 95-1 OOD are 
randomised. The residues to be randomised are shown in italics. Additional libraries 
with a CDR3 of 9,11 or 12 residues can also be created by varying the length of the 
randomised region. 

5 ■ ' 
Positive strand; 5' 

CIACTGCGCGCGTNNSNNSNNSNNSNNSNNSNNSNNSNNSNN^CGCTTACTG 
GGGTCAGGGGACCCCT 

10 Negative stand: 5' 

AGGGGTCCCCTGACCCCAGTAAGCGAASNNSNNSNNSNNSNNSNNSNNSNNSN 
iVSAWACGCGCGCAGTAG 3' 

A Ubrary in which both the heavy and light chains contained a randomised CDR3 was 
15 produced by carrying out both the heavy and light chain mutagenesis methods 
described above. 

In order to increase the efficiency of selection of high affinity binders, the FLAG tig 
mentioned above was replaced with a mass tag using the restriction endonucleases Pstl 

20 and Xhol. In order to increase the library size further two tags can be used. In this case 
the tags must differ in length by at least two residues in order to be distinguished 
following the removal of tags 1 and 2 with a protease such as Factor Xa. The 
oligonucleotides were designed with a paUndromic sequence at their 3' end which 
encompass the restriction endonuclease site for Xhol. The oligonucleotides hybridise 

25 at their 3' ends and then act as a substrate for DNA polymerase resulting in the 
production of double-stranded homoduplex, which is digested with the two restriction 
enzymes Pstl and Xhol and cloned into the digested vector using standard protocols 
(see Current Protocols in Molecular Biology eds . Ausubel FM., Brent R., Kingston 
RE., Moore DD., Seidman JG., Smith JA., Struhl K. John Wiley & Sons, Inc.). 

30 

As an example a tag of 8 residues can be created using the oligonucleotide 5' NAC 
NCC NGG NTG TKC VAG GNV CNT 3'. The length of this Tag is increased to 11 
residues if a second tag of 8 residues is also included due to the incorporation of the 
site for protease Factor Xa, which is shown in italics. This allows the tags to be 
35 identified as tag 1 or tag 2 following their removal and analysis by mass spectroscopy. 

Single tag. 

Forward OUgo; 5' GCG CTG CAG GAY GGN CGN NAC NCC NGG NTG TKC VAG GNV CNT 
40 TAGCTCGAGCTA 3' 

Reverse Oligo; 5' TAG CTC GAG CTA ANG BNC CTB GMA CAN CCN GGN GTN CCG 
CCC GTC CTG CAG CGC 3* 



45 



Double tag. 
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Forward Oligo; 5' GCG CTG CAG GAY GGN CGN NAC NCC NOG NTG TKC VAG GNV CNT 
G^y GGiV CGiVNAC NCC NGG NTG TKC VAG GNV CNT TAG CTC GAG CTA 3' 

5 Reverse OUgo; 5' TAG CTC GAG CTA ANG BNC CTB GMA CAN CCN GGN GTN CCG CCC 
GTC ANG BNC CTB GMA CAN CCN GGN GTN CCG CCC GTC CTG CAG CGC 3 ' 

Example 4 

10 In order to select high affinity binders, the initial Ubraiy was transferred into E. coli 
DH5a by electroporation (Bio-rad) and plated onto L agar containing lOO^ig/ml 
ampicillin and 1% glucose and incubated at 37°C overnight. The transformed cells 
were harvested and used to inoculate a fresh batch of L broth containing lOOng/ml 
ampicillin. The remainder of the Ubrary should be retained and stored at -TO^C and 

15 used as starting material for the rescue of high affinity clones, as described later. The 
newly inoculated cultures were incubated for 2 hours at 2>1°C prior to the addition of 
isopropylthio-p-D-galactoside (IPTG) to a final concentration of O.lmM. The cultures 
were then incubated at 37°C for a ftirther 3 hours. 

20 100 ml cultures of bacteria producing the soluble Fab library were centrifiiged at 4000 
rpm for 20 minutes at 4°C and the resulting pellet resuspended in phosphate buffered 
sahne containing 1 mM EDTA. Following agitation for 5-20 minutes on ice, the 
EDTA permeabilises the outer membrane and allows the periplasmic contents to leak 
out. The supernatant was then clarified by centrifugation and the supernatant used in 

25 subsequent steps. Alternative protocols for the release of the periplasmic contents 
could also be utilised (MoUoy; P. et al. Journal of AppUed Bacteriology, 78: 359-365, 
1995 and Molecular Cloning, A Laboratory Manual eds Sambrook J., Fritsch EF. and 
Maniatis T. Cold Spring Harbor Laboratory Press 1989, New York, USA). 

30 The periplasmic extract, containing the Fab library was aliquoted into Nunc- 
immtmotubes which had been coated overnight with soluble human CD4 (Intracel 
Corp., Issaquah, WA) at a concentration of l^g/ml in phosphate buffered saline (PBS) 
pH7.4, at room temperature in a humidified chamber. Non specific binding was 
reduced by incubating the tubes for 1 hour at room temperature with PBS containing 

35 0.05% Tween 20, 2% bovine serum albumin (BSA) and 0.05% thimerosal (Sigma) 
prior to addition of the soluble Fab. After allowing the Fab to bind to the CD4 antigen 
for 1 hour at room temperature, the unbound Fab was eliminated by washing the tubes 
20 times with PBS, 0.05% Tween 20. 

40 In order to identify the amino acid sequence of those Fabs which remain bound, the 
mass tag was removed with Factor Xa using standard protocols. The mass tag was 
then analysed by MALDI-TOF (MS/MS) spectrometry in which the molecular weight 
of each tag was determined then the sequence information obtained by analysis of the 
secondary ionisation events. By combining this information the amino acid sequence 

45 of the tags could be assigned. 
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In some instances it may be necessary to increase the efficiency of protease cleavage 
by eluting the bound Fab, neutralising and purifying the Fab from the other E.coli 
proteins by affinity purification using a sepharose-anti Ck column (Pierce Wairiner, 
5 Cheshire, UK) prepared according to the manufacturers instructions. The mass tag can 
then be removed from the bound Fab using Factor Xa. 

Following the identification of the mass tag, a fiirther two oligonucleotides were 
produced. The 3' oligonucleotide encodes the sequence of the mass tag while the 5' 
10 oligonucleotide is OLOOl which encodes the sequence at the N-temiinus of the Fab. 

Positive stand; 5' GG GCA GAT CTT TAA CTT TAA GAA GGA GAT ATA CAT 
ATG AAA TAC CTA TTG CCT ACQ G 3' 

15 Negative strand; 5' TAG CTC GAG CTA ANG BNC CTB GMA CAN CCN GGN GTN CCG CCC 
GTC ANG BNG CTB GMA CAN CCN GGN GTN CCG CCC GTC CTG CAG CGC 3' 

The clone containing the high affinity binder was rescued by adding 10 \il of the E. 
coli library to a PCR reaction containing the oligonucleotides described above. The 

20 conditions required for this reaction may vary depending upon the oligonucleotides 
being utilised. Following amplication, the PCR product was sequenced and 
subsequently purified firom low melting point agarose, digested with AatJL, which 
occurs at the N-terarinus of CDR3 of the kappa light chain and SanDI, which occurs at 
the C-tenninus of CDR3 of the heavy chain in vector pC5A8-07 and transferred into 

25 vector pC5A8-07 which had been digested with the same restriction endonucleases, 
using standard protocols (see Molecular Cloning, A Laboratory Manual eds Sambrook 
J., Fritsch EF. and Maniatis T. Cold Spring Harbor Laboratory Press 1989, New York, 
USA). The resulting plasmid was transferred into E. coU DHSa by electroporation 
using standard protocols and stored at -70°C. Alternatively, the product of the PCR 

30 reaction could be digested with a number of alternative restriction endonucleases and 
transferred into alternative vectors for Fab expression. 

In some cases a number of mass tags may be present following the initial round of 
panning. In this case, a library of clones are amplified fi-om the stored library using a 
35 mixture of 3' oligonucleotides. This limited library can then be subjected to fiirther 
roxmds of panning, the bound clones can be re-analysed by MALDI-TOF and the 
sequence of the internal tags used to create a limited repertoire of PCR primers. 

In order to confirm the affinity of the selected anti-CD4 specific Fab, periplasmic 
40 extracts should be prepared as described above and used immediately in a CD4 

specific ELIS A. The apparent affinity is a combination of the actual affinity and the 
concentration of the Fab therefore the concentration of the Fab should be established 
by carrying out an additional capture ELISA on the same extract in which a standard 
concentration cvive is produced against the FLAG tag or the human Ck domain 
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(McGregor DP., Molloy PE., Cuiminghara C. and Harris WJ. Molecular Immunology 
31,219-116. 1994). 



5 Examples 

In this example, human p53 protein was modified with a chemical tag at its N 
terminus, cleaved with a protease, the chemically tagged peptide then recovered using 
a tag-specific monoclonal antibody and the peptide then analysed by MALDI-ToF. 

10 p53 protein was a gift fi-om Dr Borek Vojisek (University of Brno, Czech Republic), 
1 OOug of p53 protein with the succinimide ester of (methyl sulphonyl) ethyl carbonate 
according to Mikolajczyk et al., Bioconjugate Chem., vol 7 ( 1 996) p 1 50-1 58 in order 
to block lysine side-chains. The blocked protein was dissolved at Img/ml in O.IM 
sodium bicarbonate buffer pH8.5 and NHS-SS-biotin (Pierce, Chester, UK) was added 

15 to I OOug/ml final. The reaction was carried out for 6 hours at room temperature and 
terminated with ethanolamine. The protein mixture was then passed down a Sephadex 
G25 column (Pharmacia, Milton Keynes, UK) in PBS and the void volume collected 
using A280 measurements of the eluates. 40ul of eluate containing 2ug p53 was then 
heat denatured (95c for 5 mins), cooled to 37c and lug endoproteinase Arg-C (fi-om C. 

20 histolyticum, Calbiochem, Nottingham, UK) was added and the mixture incubated at 
37c for 1 hour. Then lOul of streptavidin-agarose (Sigma, Poole, UK) in PBS was 
added and the rnixture shaken for 10 ininutes. The agarose was pelleted at 16000g for 
1 min and washed three times in TSO buffer (75mM Tris.HCl, 200mM NaCl, 0.5% N- 
octyl glucoside, pH8) and three times in TSMK (lOmM Tris.HCl, 200mM NaCl, 5mM 

25 2-mercaptoethanol, pH8). Finally, 1 Oul of a saturated solution of alpha-cyano-4- 
hydroxycinnamic acid in 1% aqueous trifluoroacetic acid/acetonitrile (1:1 v/v) was 
added to the washed beads and lul of this was loaded onto the mass spectrometer chip. 
The analysis was carried out using a Perseptive Biosystems Voyager-DE STR 
Biospectrometry Workstation (Perseptive Biosystems). The mass spectra were 

30 collected by adding spectra from 200 laser shots. 

The results showed a major peak corresponding to the 65 amino acid N terminal Arg- 
C endoprotease fi-agment with no significant levels of other p53 Arg-C peaks. 

35 Example 6 

The method of example 5 was repeated except that the N terminal biotin-tagged 
peptide was used to isolate a sii7,gle-chain Fv antibody fragment from a phage display 
hbrary of single-chain Fv's. Subsequently, the single-chain Fv was used to isolate the 
40 N-tenninal peptide fragment from a protease digest of the test protein as confirmed by 
MALDI-ToF. An exfract of normal human brain, prepared as in example 4, was 
conjugated to KLH according to Harlow and Lane, "Antibodies" (1988) (Cold Spring 
Harbor Publications) and used to immunise two BalbC mice. 2 doses were given 
intra-peritoneally with an interval of 4 weeks between them. 3 to 4 days after the 2nd 
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inoculation, the mice were sacrificed and spleens removed by dissection. Spleen 
mRNA preparation was then initiated using QuickPrep^"" mRNA purification kit 
(Pharmacia) according to the manufacturer's instructions 

5 The Pharmacia Recombinant Phage Antibody System (Pharmacia) was used to 

produce a library of mouse single chain Fvs (ScFv). First-strand cDNA was generated 
fironi the mRNA using M-MuLV reverse transcriptase and random hexamer primers. 
Antibody heavy and light chain genes were then ampUfied using specific heavy and 
light chain primers complementary to conserved sequences flanking the antibody 

1 0 variable domains. The 340 and 325 base pair products generated for heavy and light 
chain DNA respectively were separately purified following agarose gel 
electrophoresis. These were then assembled into a single ScFv construct using a DNA 
linker-primer .mix to give the VH region j oined by a (Gly4Ser)3 peptide to the VL 
region. The assembled ScFv were amphfied with primers designed to insert Sfi 1 and 

1 5 Not 1 sites at the 5' and 3 ' ends respectively, giving an 800 bp product. This Segment 
was purified, sequentially digested with Sfil and NotI, and repurified. The fiagment 
was then hgated into Sfil and NotI cut pCANTAB 5 phagemid vector. PCANTAB 5 
contains the gene encoding the Phage Gene 3 protein (g3p) and the ScFv is inserted 
adjacent to the g3 signal sequence such that it will be expressed as a g3p fusion 

20 protein. Competent E.coli TGI cells were transformed with the pCantab 5/ScFv 

phagemid tiien subsequently infected with the Ml 3K07 helper phage. The resulting 
recombinant phage contained DNA encoding the ScFv genes and displayed one or 
more copies of recombinant antibody as fiision proteins at their tips. 

25 Phage-displayed ScFv that bind to the peptides were then selected or enriched by 

panning. Briefly, the biotmylated and protease treated p53 preparation firom example 
1 was appUed to a stieptavidin-coated glass sHde (Radius Biosciences, Waltham, 
USA) and the shde was washed four times in PBS. After blocking with 2% non-fat 
dry milk in PBS, the phage preparation was applied and incubated for 1 hour. After 

30 washing 1 0 times witii TBS/0.05% Tween 20, peptide reactive recombinant phage 
were detected with horse radish peroxidase conjugated anti-M13 antibody and 
revealed with o-phenylene diamine chromogenic substrate. These phage were 
subsequently eluted with 0. IM glycincHCl pH2.2 and Img/ml BSA and neuti^sed 
with 2M Tris base. The eluted phage were amplified in JMl 03 grown in 25mlJ broth. 

35 Two additional rounds of panning were undertaken and finally 10 single plaques were 
isolated, pooled and further amplified. An ahquot of 10'° ampUfied phage was 
incubated for 2 hours at 4c with O.lug of biotinylated and endoproteinase Arg-C 
digested p53 in TSO buffer. After 2 hours, 0.5ug of anti-M13 (Pharmacia) in TSO 
was added and incubated for 1 hour following which 5ul of protein A/G agarose 

40 (Sigma) was added and the mixture incubated for a fiirther 0.5 hours with swirting. 

The agarose beads were then pelleted, washed as in example 1 above and analysed by 
mass spectrometry. 
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The results showed the same major peak as in example 1 corresponding to tbe 65 
amino acid N terminal Arg-C endoprotease fragment. 

Example 7 

5 

In this example, a gene fragment encoding a test protein was subjected to priming with 
a synthetic oUgonucleotide encoding a polyhistidine tag. The cDNAs were expressed 
by in vitro transcription and translation (TVTT) and the tagged peptide fragments were 
then isolated \ising a nickel chelate column. Tliese fragments were then used to isolate 
10 a single-chain Fv antibody fragment. Subsequently, the single-chain Fv was used to 
isolate a peptide fragment from a protease digest of the test protein as confirmed by 
mass spectrometry. 

Example 8 

15 

The method of example 6 was repeated using a total protein preparation from cells and 
the chemically tagged peptide were used to isolate a collection of single-chain Fv 
antibody fragments. Subsequently, a mixture of twelve of these single-chain Fv's was 
used to isolate peptide fragments from a protease digest of the test protein and 
20 analysed by mass spectrometry. 
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CLAIMS: 



1 . A method of protein identijBcation, screenirig and/or sequencing comprising 
providing a library of individual proteins, one or more of which may bind to a target 
5 of interest, wherein each individual protein includes in its sequence a "barcode" 
sequence, which can be used to identify each individual protein in the library. 



2. A method as claimed in claim 1 wherein the individual "barcode" sequences 
are encoded by one or more nucleic acid sequences inserted into the genes encoding 
10 the individual proteins in the library. 



3. A method as claimed in claim 2 wherein the "barcode" sequence or sequences 
is/are flanked by one or more recognition sites for endoprotease digestion. 

15 4. A method as claimed in claim 3 wherein the recognition site is the site for an 
endopeptidase such as enterokinase or Factor Xa. 

5. A method as claimed in claim 3 or claim 4 wherein the library of proteins is 
brought into contact/association with one or more target moieties, eg target proteins. 

20 

, 6. A method as claimed in claim 5 wherein the proteins and one or more target 
moieties will bind in solution. 



7. A method as claimed in claim 5 or claim 6 wherein after binding, the 
25 complexes of protein/target moiety are isolated, followed by digestion with 

endoprotease to release the "barcode" sequence or sequences. 

8. A method as claimed in claim 7 wherein the released "barcode" sequence or 
sequences is/are used to design one or more synthetic oligonucleotides, eg primers, for 

30 the recovery or ampUfication of one or more genes encoding those proteins which bind 
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to the target moiety(ies). 

9. A method as claimed in claim 8 wherein mass spectrometry is used to 
determine the mass of any released "barcode" sequences, which can in turn identify 

5 the released sequence or sequences, or wherein mass spectrometry is used to 
determine the sequence of any released "barcode" directly. 

10. A method as claimed in any one of claims 1 to 9 wherein the hbrary of proteins 
is a library of antibodies. 

10 

1 1. A method as claimed in claim 10 wherein the library of proteins is a hbrary of 
antibody domains, eg recombinant antibody domains such as Fvs, which include 
antibody variable regions.. 

15 12. A method as claimed in claim 1 1 wherein the library comprises Fvs, consisting 
of two chains (heavy and light-chain derived chains, VH and VL). 

13. A method as claimed in claim 12 wherein the VH and VL chains each have 
their own "barcode" sequence. 

20 

14. A method as claimed in any one of claims 10 to 13 wherein the "barcode" 
sequence is C terminal to the Fv sequence. 

15. A hbrary of proteins as defined in any one of claims 1 to 14. 

25 

16. A method of screening a protein hbrary comprising screening said hbrary for 
one or more desired properties, followed by dereplication to identify one or more 
individual proteins in the library having the desired property. 

30 17. A method as claimed in claim 16 wherein the library is screened for binding to 
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a target moiety. 

18. A method as claimed in claim 17 wherein binding is detected by mass 
spectrometry, particularly matrix-assisted laser desorption/ionisation time-of-flight 
(MALDI-ToF) spectrometry., 

5 

19. A method as claimed in claim 16 wherein the library is screened for a specific 
biological activity. 

20. A method as claimed in claim 17 or claim 18 wherein the target is a complex 
10 mixture, eg a mixture of molecules, whole cells or cell membranes. 

21. A method of protein identification and/or sequencing comprising 
providing a library of individual proteins, one or more of which may bind to a target of 
interest, wherein each individual protein, together with its gene, is bound to an 

1 5 "associating moiety". 

22. A method as claimed in claim 21 wherein the library of proteins is 
brought into contact with the target of interest either before or after the "associating 
moiety". 

20 

23. A method as claimed in claim 21 or claim 22 wherein after screening for 
binding to the target the library is dereplicated to identify one or more proteins with a 
desirable property, proteins which bind to the target. 

25 24. A method as claimed in any one of claims 21 to 23 where the 

"associating moiety" is a particle. 

25. A method as claimed in claim 24 wherein the particle is a latex bead. 



30 



26. 



A method as claimed in any one of claims 21 to 23 wherein the 
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"associating moiety" is a protein or protein complex. 

27. A method as claimed in claim 26 wherein the "associating moiety" is 
avidin or steptavidin and each of the proteins in the library and their associated genes 

5 are biotinylated. 

28. A method as claimed in claim 21 or claim 22 wherein the "associating 
moiety" is a bispecific binding molecule capable of binding to both the proteins and 
genes. 

29. A method as claimed in any one of claims 21 to 23 wherein the 
"associating moiety" is a living cell or cellular virus such as a bacteria or 
bacteriophage. 

15 30. A method as claimed in any one of claims 21 to 29 wherein one or other 

molecules which alter the properties of the proteins in the library are bound to the 
"associating moiety". 

31 . A method as claimed in any one of claims 21 to 30 wherein the genes 
20 encoding the proteins in the library are are attached to the "associating moiety" prior 

to synthesis of the individual proteins. 

32. A method as claimed in any one of claims 21 to 3 1 wherein the hbrary of 
proteins is a Ubrary of antibody proteins, eg a library of antibody domains such as Fvs. 

25 

33. A method of protein identification and/or sequencing comprising 
providing a library of individual proteins, one or more of which may bind to a target of 
interest, wherein each individual protein is attached to an individual "coding moiety". 



30 34. 



A method as claimed in claim 33 wherein the "coding moieties" are 
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particles with unique identifier "codes". 

35. A method as claimed in claim 34 wherein the "codes" are different ratios 
of measm^ble signal, eg fluorescent, chemiluminescent or radioactive labels, or a 

5 physical feature such as a unique marking. 

36. A method for analysing mixtures of proteins comprising: 

(iii) digestion or cleavage ofthe protein mixture; 
10 (iv) fi^ctionation of the resultant peptides; and 

(v) analysis of the resultant peptides by means of their mass and/or 
sequence. 

37. A method as claimed in claim 36 wherein the fractionation in step (ii) is 
15 carried out using a library of protein binding agents. 

38. A method as claimed in claim 36 wherein the resultant peptides are 
subjected to physical fi:2u:tionation and/or chemical tagging as part ofthe fractionation 
step. 

20 

39. A method as claimed in claim 36 wherein the resultant peptides are 
subjected to addition of one or more amino acids as part ofthe fractionation step. 

40. A method as claimed in any one of claims 37 to 39 wherein the library of 
25 protein binding agents is a library of antibodies or antibody fragments. 

41. A method as claimed in any one of claims 37 to 39 wherein the protein 
binding agents are major histocompatibility proteins, T cell receptors and natural 
proteins or protein domains involved in protein-protein binding interactions, such as 

30 SHI domains. 
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42. A method as claimed in claim 40 or claim 41 wherein the library of 

protein binding agents is pre-selected for binding to one or more proteins or peptides 
derived from the protein mixture or a related protein mixture under analysis. 

5 43. A method as claimed in claim 42 wherein the protein mixture is derived 

from a normalised recombinant gene library. 

44. A method as claimed in any one of claims 36 to 43 wherein the protein 
mixture is initially boimd to a solid phase prior to digestion or cleavage either via the 

10 N or C-terminus or via specific amino acids or via specific sequences of amino acids. 

45. A method as claimed in any one of claims 36 to 43 wherein specific 
amino acids or modified amino acids found in the proteins are derivatised prior to 
binding to a solid phase, such binding occurring either before or after digestion or 

15 cleavage of the protein mixtures. 

46. A method as claimed in claim 45 wherein the specific, or modified, 
amino acids are derivatised with biotin prior to binding to avidin or strep tavidin. 

\ 

20 47. A method as claimed in claim 45 wherein specific, or modified, amino 

acids are derivatised wiih ligands prior to binding to ligand-specific affinity reagents. 

48. A method as claimed in any one of claims 36 to 43 wherein specific 
natiually modified amino acids foimd in the proteins are bound to a solid phase using 

25 modification specific affinity reagents, such binding occurring either before or after 
digestion or cleavage of the protein mixtures. 

49. A method as claimed in any one of claims 45 to 48 wherein more than 
one cycle of digestion/cleavage and derivatisation is carried out. 

30 ' 
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50. A method as claimed in claim 49 wherein mass analysis is carried out 
after each cycle of digestion or cleavage. 

51. A method as claimed in any one of claims 36 to 50 wherein peptides 
released after digestion/cleavage are ifractionated using physical methods such as 
HPLC before or after fractionation using protein binding agents. 
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<160> 69 

<170> PatentIn Ver. 2.1 

<210> 1 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 

oligonucleotide for an 8 amino acid barcode peptide 



2 



<220> 
<221> CDS 
<222> (1) . . (24) 
<223> 



<220> 

<221> misc_f eature 
<222> (1) . . (1) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (4) . . (4) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (7) . . (7) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 
<222> (10) . . (10) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 

<222> (14) . . (14) 

<223> k=g,t 

<220> 

<221> misc_f eature 
<222> (16) . . (16) 
^<223> v=a,g,c 

<220> 

<221> inisc_f eature 
<222> (20) . . (20) 
<223> n=a,g,t,c 

<220> 

<221> mi sc_f eature 
<222> (21) . . (21) 
<223> v=a, g, c 

<220> 

<221> misc_f eature 
<222> (23) . . (23) 
<22 3> n=a,t,g,c 



<400> 1 

nac ncc ngg ntg tkc vag gnv cnt 
Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
1 5 



<210> 2 
<211> 8 



3 



<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
barcode peptide 

<220> 

<221> misc_feature 
<222> (1) . . (1) 

<223> The 'Xaa' at location 1 stands for Asn, Asp, His, or Tyr. 
<220> 

<221> misc_f eature 
<222> (2) . . (2) 

<223> The 'Xaa' at location 2 stands for Thr, Ala, Pro, or Ser. 
<220> 

<221> misc_f eature 
<222> (3) . . (3) 

<223> The 'Xaa' at location 3 stands for Arg, Gly, or Trp . 
<220> 

<221> misc_feature 
<222> (4) . . (4) 

<223> The 'Xaa' at location 4 stands for Met, Val, or Leu. 
<220> 

<221> misc_feature 
<222> (5) - - (5) 

<223> The 'Xaa' at location 5 stands for Cys , or Phe . 
<220> 

<221> misc_f eature 
<222> (6) . . (6) 

<223> The 'Xaa' at location 6 stands for Lys , Glu, or Gin. 
<220> 

<221> misc_feature 
<222> (7) . . (7) 

<223> The 'Xaa' at location 7 stands for Glu, Asp, Gly, Ala, or Val. 
<220> 

<221> misc_f eature 
<222> (8) . . (8) 

<223> The 'Xaa' at location 8 stands for His, Arg, Pro, or Leu. 
<400> 2 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
15 



<210> 3 
<211> 14 
<212> PRT 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: Linker 
peptide 

<400> 3 

Glu Gly Lys Ser Ser Gly Ser Gly Ser Glu Ser Lys Val Asp 
15 10 



<210> 4 
<211> 8 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Flag 
epitope peptide 

<400> 4 

Met Asp Tyr Lys Asp Asp Asp Lys 
1 5 



<210> 5 
<211> 53 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
RD5' Flag 

<400> 5 

gcggatccca tatggactac aaagacgatg acgacaaaca ggtgcagctg cag 



<210> 6 
<211> 35 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
RD3 ' 

<400> 6 

gcgaattcgt ggtggtggtg gtggtgtgac tctcc 



<210> 7 
<211> 50 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
Foslfor 



5 



<400> 7 

atggaattcc tcgagaccga caccctacag gcggaaaccg accagctgga 



<210> 8 
<211> 50 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
FosSOrev 

<400> 8 

tcgcgatttc ggtttgcagc gcggattttt cgtcttccag ctggtcggtt 



<210> 9 
<211> 50 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
FosVlfor 

<400> 9 

aaaccgaaat cgcgaacctg ctgaaagaaa aagaaaagct ggagttcatc 



<210> 10 
<211> 50 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
Fosl55rev 

<400> 10 

ggaagcttga attccgccgg acggtgtgcc gccaggatga actccagctt 



<210> 11 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
Fosl fS 

<400> 11 

atggaattcc tcgagacc 



<210> 12 
<211> 18 



<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
FOS155 rS 

<400> 12 

ggaagcttga attccgcc 



<210> 13 
<211> 28 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
for 340 VH amplification 

<400> 13 

cagctgcagg agtctggggg aggcttag 



<210> 14 
<211> 36 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
for 340 VH amplification 

<400> 14 

tcagtagacg gtgaccgagg ttccttgacc ccagta 



<210> 15 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
for 340 VK amplification 

<400> 15 

gtgacattga gctcacacag tctcct 



<210> 16 
<211> 28 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
for 340 VK amplification 



<400> 16 

cagcccgttt tatctcgagc ttggtccg 



<210> 17 
<211> 47 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
RD5' His 

<400> 17 

gcggatccca tatgcaccat catcaccatc accaggtgca gctgcag 



<210> 18 
<211> 50 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
Junlfor 

<400> 18 

atgagaattc tcgagcgtat cgctcgtctg gaagaaaaag ttaaaaccct 



<210> 19 
<211> 50 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
JunSSrev 

<400> 19 

tagcggtgga agccagttcg gagttctgag ctttcagggt tttaactttt 



<210> 20 
<211> 50 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
Jun71for 

<400> 20 

tggcttccac cgctaacatg ctgcgtgaac aggttgctca gctgaaacag 



<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
Junl46rev 

<400> 21 

catgcgaatt cgtggttcat aactttctgt ttcagctgag caacc 



<210> 22 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
Junlf or-S 

<400> 22 

atgagaattc tcgagcg 



<210> 23 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
Junl46rev-S 

<400> 23 

catgcgaatt cgtggttc 



<210> 24 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Primer 
Bio T7 

<400> 24 

agatctcgat cccgcaaatt a 



<210> 25 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: Primer 
petrev 

<400> 25 

aaataggcgt atcacgaggc c 



<210> 26 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Linker I 
of oligonucleotide pool 

<400> 26 

ggccgcgagg aagaggaaat gatggc 



<210> 27 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Linker II 
of oligonucleotide pool 

<220> 

<221> misc_feature 
<222> (21) . . (21) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 

<222> (24) . . (24) 

<223> n=a,t,g,c 

<400> 27 

ggccgcgagg aagaggaaca ncangc 



<210> 28 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Linker III 
of oligonucleotide pool 

<220> 

<221> misc_f eature 
<222> (21) . . (21) 
<22 3> n=a,t,g,c 
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<220> 

<221> misc_f eature 
<222> (24) . . (24) 
<223> n=a,t,g,c 

<400> 28 

ggccgcgagg aagaggaaag nagngc 



<210> 29 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Linker IV 
of oligonucleotide pool 

<220> 

<221> misc_f eature 
<222> (21) . . (21) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (24) . . (24) 
<223> n=a,t,g,c 

<400> 29 

ggccgcgagg aagaggaaaa naangc 



<210> 30 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Linker V 
of oligonucleotide pool 

<220> 

<221> misc_f eature 
<222> (21) . . (21) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 
<222> (24) . . (24) 
<223> n=a,t,g,c 

<400> 30 

ggccgcgagg aagaggaaga ngangc 
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<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Linker VI 
of oligonucleotide pool 

<220> 

<221> misc_f eature 
<222> (21) . . (21) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (24) . . (24) 
<223> n=a, t,g, c 

<400> 31 

ggccgcgagg aagaggaatt nttngc 



<210> 32 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Linker VII 
of oligonucleotide pool 

<220> 

<221> misc_f eature 
<222> (7) . . (7) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 
<222> (10) . . (10) 
<223> n=a,t,g,c 

<400> 32 

ggccgcnaan aactccttct cctcgc 



<210> 33 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Linker VIII 
of oligonucleotide pool 

<220> 

<221> misc_f eature 
<222> (7) . . (7) 
<223> n=a,t,g,c 
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<220> 

<221> misc_f eature 
<222> (10) . . (10) 
<223> n=a,t,g,c 

<400> 33 

ggccgcntcn tcctccttct cctcgc 



<210> 34 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Linker IX 
of oligonucleotide pool 

<220> 

<221> misc_f eature 
<222> (7) . . (7) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (10) . . (10) 
<223> n=a,t,g,c 

<400> 34 

ggccgcngtn gtctccttct cctcgc 



<210> 35 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Linker X 
of oligonucleotide pool 

<220> 

<221> misc_feature 
<222> (7) . . (7) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (10) . . (10) 
<223> n=a,t,g,c 

<400> 35 

ggccgcnctn ctctccttct cctcgc 



<210> 36 
<211> 26 
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<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Linker XI 
of oligonucleotide pool 

<220> 

<221> misc_feature 
<222> (7) . . (7) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (10) . . (10) 
<223> n=a,t,g,c 

<400> 36 

ggccgcnagn agctccttct cctcgc 



<210> 37 

<211> 26 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Linker XII 
of oligonucleotide pool 

<400> 37 

ggccgccatc atctccttct cctcgc 



<210> 38 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
T7 promoter sequence 

<400> 38 

ttaatacgac tcactata 



<210> 39 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
DNA linker 

<400> 39 

agctaatacg actcactata 
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<210> 40 
<211> 8 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: C- terminal 
FLAG tag sequence 

<400> 40 

Asp Tyr Lys Asp Asp Asp Asp Lys 
1 5 



<210> 41 
<211> 57 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
OL 001 sequence 

<220> 

<221> CDS 

<222> (36) . . (56) 

<223> pelB leader sequence 

<400> 41 

gggcagatct ttaactttaa gaaggagata tacat atg aaa tac eta ttg cct 

Met Lys Tyr Leu Leu Pro 
1 5 



acg g 
Thr 



<210> 42 
<211> 7 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
OL 001 peptide sequence 

<400> 42 

Met Lys Tyr Leu Leu Pro Thr 
1 5 



<210> 43 
<211> 43 
<212> DNA 
<213> Artifici 
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<220> 

<223> Description of Artificial Sequence: Synthetic 
OL 002 sequence 

<400> 43 

gggtctgggt cataacgata tcggccatcg ctggttgggc age 



<210> 44 
<211> 42 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
OL 003 sequence 

<400> 44 

ggtaccaaac tggagatcaa acggactgtg gctgcaccat ct 



<210> 45 
<211> 42 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
OL 0 04 sequence 

<400> 45 

agatggtgca gccacagtcc gtttgatctc cagtttggta cc 



<210> 46 
<211> 39 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
OL 005 sequence 

<400> 46 

gatcgaattc ctaacactct ccgcggttga agctctttg 



<210> 47 
<211> 37 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
OL 006 sequence 

<400> 47 

gatcgaattc taactttaag aaggagatat acatatg 
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<210> 48 
<211> 42 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Oli 007 sequence 

<400> 48 

ggactgaacc agttggactt cggccatcgc tggttgggca gc 



<210> 49 

<211> 41 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
OL 008 sequence 

<400> 49 

accctggtta ccgtctcctc agcctccacc aagggcccat c 



<210> 50 
<211> 43 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
OL 009 sequence 

<400> 50 

gatgggccct tggtggaggc tgaggagacg gtaaccaggg tac 



<210> 51 
<211> 36 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
OL 010 sequence 

<400> 51 

gatcgagctc tgctttcttg tccaccttgg tgttgc 



<210> 52 
<211> 52 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: Synthetic 
OL Oil sequence 

<400> 52 

cccaaatctt gcgctgcaga ctacaaagac gacgacgaca aatagctcga gc 



<210> 53 

<211> 56 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
OL 012 sequence 

<400> 53 

ttaagctcga gctatttgtc gtcgtcgtct ttgtagtctg cagcgcaaga tttggg 



<210> 54 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
OL 013 sequence 

<400> 54 

gaagacgtcg ctgtttac 



<210> 55 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
OL 014 sequence 

<400> 55 

ggtaccaagc ttgagatc 



<210> 56 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
OL 015 sequence 
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<400> 56 

ctactgcgcg cgtgaaaaag 



<210> 57 

<211> 17 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
OL 016 sequence 

<400> 57 

gggtcagggg accctgg 



<210> 58 
<211> 77 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
oligonucleotide for CDR3 light chain; positive 
strand 

<220> 
<221> CDS 
<222> (1) . . (75) 
<223> 

<220> 

<221> misc_f eature 
<222> (31).. (32) 
<223> n=a,t,g,c 

<220> 

<221> inisc_f eature 

<222> (33) . . (33) 

<223> s=g,c 

<220> 

<221> misc_f eature 
<222> (34) . . (35) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 

<222> (36) . . (36) 

<223> s=g,c 

<220> 

<221> misc_feature 
<222> (37) . . (38) 
<223> n=a,t,g,c 
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<220> 

<221> misc_f eature 
<222> (39) . . (39) 
<223> s=g,c 

<220> 

<221> misc_f eature 
<222> (40) . . (41) 
<223> n=a,t,g,c 

<22G> 

<221> Tnisc_feature 

<222> (42) . . (42) 

<223> s=g,c 

<220> 

<221> mi sc_f eature 

<222> (43) . . (44) 

<223> n=a,t,g,c 

<220> 

<221> misc_feature 
<222> (45) . . (45) 
<223> S=g,C 

<220> 

<221> misc_feature 
<222> (46) . . (47) 
<223> n=a,t,g,c 



<220> 

<221> misc_feature 
<222> (48) . . (48) 
<223> s=g,c 

<220> 

<221> misc_f eature 
<222> (49) . . (50) 
<223> rL=a,t,g,c 

<220> 

<221> misc_f eature 

<222> (51) . . (51) 

<223> s=g,C 



<400> 58 

gaa gac gtc get gtt tac tac tgc cag cag nns nns nns nns nns nns 

Glu Asp Val Ala Val Tyr Tyr Cys Gin Gin Xaa Xaa Xaa Xaa Xaa Xaa 
1 5 10 15 



nns acc ttc ggt ggt ggt acc aag ctt gg 
Xaa Thr Phe Gly Gly Gly Thr Lys Leu 



<210> 59 
<211> 25 
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212> PRT 

213> Artificial Sequence 
220> 

223> Description of Artificial Sequence: Synthetic 
peptide for CDR3 light chain; positive strand 

220> 

221> misc_feature 
222> (11) . . (11) 

<223> The 'Xaa' at location 11 stands for Lys, Asn, Arg, Ser, Thr, Met, 
lie, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, 
Tyr , Trp , Cy s , or Phe . 

<220> 

<221> misc_feature 
<222> (12) . . (12) 

<223> The -Xaa- at location 12 stands for Lys, Asn, Arg, Ser, Thr, Met, 
He, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, 
Tyr , Trp , Cys , or Phe . 

<220> 

<221> misc_feature 
<222> (13) . . (13) 

<223> The 'Xaa' at location 13 stands for Lys, Asn, Arg, Ser, Thr, Met, 
He, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, 
Tyr, Trp, Cys, or Phe. 

<220> 

<221> misc_f eature 
<222> (14) . . (14) 

<223> The 'Xaa' at location 14 stands for Lys, Asn, Arg, Ser, Thr, Met, 
He, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, 
Tyr , Trp , Cys , or Phe . 

<220> 

<221> misc_f eature 
<222> (15) . . (15) 

<223> The 'Xaa' at location 15 stands for Lys, Asn, Arg, Ser, Thr, Met, 
He, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, 
Tyr, Trp, Cys, or Phe. 

<220> 

<221> misc_f eature 
<222> (16) . . (16) 

<223> The 'Xaa' at location 16 stands for Lys, Asn, Arg, Ser, Thr, Met, 
He, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, 
Tyr, Trp, Cys, or Phe. 

<220> 

<221> mi sc_f eature 
<222> (17) . . (17) 

<223> The 'Xaa' at location 17 stands for Lys, Asn, Arg, Ser, Thr, Met, 
He, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, 
Tyr, Trp, Cys, or Phe. 
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<400> 59 

Glu Asp Val Ala Val Tyr Tyr Cys Gin Gin Xaa Xaa Xaa Xaa Xaa Xaa 
1 5 ■ 10 15 

Xaa Thr Phe Gly Gly Gly Thr Lys Leu 
20 25 



<210> 60 
<211> 77 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
oligonucleotide for CDR3 light chain; negative 
strand 

<220> 

<221> misc_feature 
<222> (27) . . (27) 
<223> s=g,c 

<220> 

<221> misc_feature 
<222> (28) . . (29) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 

<222> (30) . . (30) 

<223> S=g,C 

<220> 

<221> misc_f eature 
<222> (31) . . (32) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (33) . . (33) 
<223> s=g,c 

<220> 

<221> misc_feature 
<222> (34) . . (35) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 
<222> (36) . . (36) 
<223> s=g,c 

<220> 

<221> misc_feature 

<222> (37) . . (38) :' 

<223> n=a,t,g,c 
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<220> 

<221> misc_f eature 
<222> (39) . . (39) 
<223> s=g,c 

<220> 

<221> niisc_f eature 
<222> (40) . . (41) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 

<222> (42) . . (42) 

<223> S=g,C 

<220> 

<221> misc_feature 
<222> (43) . . (44) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (45) . . (45) 
<223> s=g,c 

<220> 

<221> misc_f eature 
<222> (46) . . (47) 
<223> n=a, t , g, c 

<400> 60 

ccaagcttgg taccaccacc gaaggtsnns nnsnnsnnsn nsnnsnnctg ctggcagtag 
taaacagcga cgtcttc 



<210> 61 
<211> 70 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
oligonucleotide for CDR3 heavy chain; positive 
strand 

<220> 
<221> CDS 
<222> (2) . . (70) 
<223> 

<220> 

<221> misc_feature 
<222> (14) . . (15) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 



<222> (16) . . (16) 
<223> s=g,c 

<220> 

<221> misc__f eature 
<222> (17) . . (18) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 
<222> (19) . - (19) 
<223> s=g,c 

<220> 

<221> misc_f eature 

<222> (20) . . (21) 

<223> n=a,t,g,c 

<220> 

<221> misc_feature 

<222> (22) . . (22) 

<223> S=g,C 

<220> 

<221> misc_f eature 
<222> (23) . . (24) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (25) . . (25) 
<223> s=g,c 

<220> 

<221> misc_f eature 

<222> (26) . . (27) 

<223> n=a,t,g,c 

<220> 

<221> misc_f eature 

<222> (28) . . (28) 

<223> s=g,c 

<220> 

<221> misc_f eature 
<222> (29) . . (30) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 
<222> (31) . . (31) 
<223> s=g,c 

<220> 

<221> Tnisc_f eature 
<222> (32) . . (33) 
<223> n=a,t,g,c 
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<220> 

<221> niisc_f eature 
<222> (34) . . (34) 
<223>s=g,c 

<220> 

<221> misc_feature 
<222> (35) . . (36) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 

<222> (37) . . (37) 

<223> s=g,c 

<220> 

<221> misc_f eature 

<222> (38) . . (39) 

<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (40) . . (40) 
<223> S=g,C 

<220> 

<221> misc_feature 
<222> (41) . . (42) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 
<222> (43) . . (43) 
<223> s=g,c 

<400> 61 

c tac tgc gcg cgt nns nns nns nns nns nns nns nns nns nns ttc get 

Tyr Cys Ala Arg Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Phe Ala 
15 10 15 

tac tgg ggt cag ggg acc cct 
Tyr Trp Gly Gin Gly Thr Pro 
20 



<210> 62 
<211> 23 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
peptide for CDR3 heavy chain; positive strand 

<220> 

<221> misc_f eature 
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222> (5) . . (5) 

223> The ' Xaa ' at location 5 stands for Lys , Asn, Arg, Ser, Thr, Met, 
lie, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, 
Tyr, Trp, Cys , or Phe . 

220> 

221> misc_f eature 
222> (6) . . (6) 

223> The 'Xaa' at location 6 stands for Lys 
lie, Glu, Asp, Gly, Ala, Val, Gin, His 
Tyr , Trp , Cys , or Phe . 

220> 

221> misc_f eature 
222> (7) . . (7) 

223> The 'Xaa' at location 7 stands for Lys 
lie, Glu, Asp, Gly, Ala, Val, Gin, His 
Tyr , Trp , Cys , or Phe . 

220> 

221> misc_feature 
222> (8) . . (8) 

223 > The 'Xaa' at location 8 stands for Lys 
lie, Glu, Asp, Gly, Ala, Val, Gin, His 
Tyr , Trp , Cys , or Phe . 

-220> 

221> misc_f eature 
222> (9) . . (9) 

223 > The 'Xaa' at location 9 stands for Lys 
lie, Glu, Asp, Gly, Ala, Val, Gin, His 
Tyr, Trp, Cys, or Phe. 

220> 

221> misc_f eature 
-222> (10) . . (10) 
:223> The 'Xaa' at location 10 
lie, Glu, Asp, Gly, Ala, 
Tyr , Trp , Cys , or Phe . 

220> 

221> misc_f eature 
222> (11) . . (11) 

223> The 'Xaa' at location 11 stands for Lys, Asn, Arg, Ser, Thr, Met 
lie, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, 
Tyr, Trp, Cys, or Phe. 

220> 

221> misc_feature 
222> (12) . . (12) 

223> The 'Xaa' at location 12 stands for Lys, Asn, Arg, Ser, Thr, Met 
lie, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, 
Tyr , Trp , Cys , or Phe . 

220> 

:221> misc_feature 



, Asn, Arg, Ser, Thr, Met, 
, Pro, Leu, a stop codon. 



, Asn, Arg, Ser, Thr, Met, 
, Pro, Leu, a stop codon. 



, Asn, Arg, Ser, Thr, Met, 
, Pro, Leu, a stop codon. 



, Asn, Arg, Ser, Thr, Met, 
, Pro, Leu, a stop codon. 



stands for Lys, Asn, Arg, Ser, Thr, Met 
Val, Gin, His, Pro, Leu, a stop codon. 
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<222> (13) . . (13) 

<223> The 'Xaa' at location 13 stands for Lys , Asn, Arg, Ser, Thr, Met, 
lie, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, 
Tyr , Trp , Cy s , or Phe . 

<220> 

<221> misc_f eature 
<222> (14) . . (14) 

<223> The 'Xaa' at location 14 stands for Lys, Asn, Arg, Ser, Thr, Met, 
lie, Glu, Asp, Gly, Ala, Val, Gin, His, Pro, Leu, a stop codon, 
Tyr, Trp, Cys, or Phe. 

<400> 62 

Tyr Cys Ala Arg Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Phe Ala 
1 5 10 15 

Tyr Trp Gly Gin Gly Thr Pro 



<210> 63 
<211> 70 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
oligonucleotide for CDR3 heavy chain; negative 
strand 

<220> 

<221> misc_feature 
<222> (28) . . (28) 
<223> S=g, c 

<220> 

<221> misc_feature 

<222> (29) . . (30) 

<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (31) . . (31) 
<223> s=g,c 

<220> 

<221> misc_f eature 
<222> (32) . . (33) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (34) . . (34) 
<223> s=g,c 

<220> 

<221> misc_f eature 



<222> (35) . . (36) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 
<222> (37) . . (37) 
<223> s=g,c 

<220> 

<221> misc_f eature 

<222> (38) . . (39) 

<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (40) . . (40) 
<223> s=g,c 

<220> 

<221> misc_f eature 
<222> (41) . . (42) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 
<222> (43) . . (43) 
<223> s=g,c 

<220> 

<221> misc_f eature 
<222> (44) . . (45) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (46) . . (46) 
<223> s=g,C 

<220> 

<221> misc_feature 
<222> (47) . . (48) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 
<222> (49) . . (49) 
<223> s=g,c 

<220> 

<221> misc_f eature 

<222> (50) . . (51) 

<223> n=a,t,g,c 

<220> 

<221> misc_feature 
<222> (52) . . (52) 
<223> S=g,c 
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<220> 

<221> misc_f eature 
<222> (53) . . (54) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (55) . . (55) 
<223> s=g,c 

<220> 

<221> Tnisc_f eature 
<222> (56) . . (57) 
<223> n=a,t,g,c 

<400> 63 

aggggtcccc tgaccccagt aagcgaasnn snnsnnsnns nnsnnsnnsn nsnnsnnacg 
cgcgcagtag 



<210> 64 
<211> 54 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Single 
tag; forward synthetic oligonucleotide 

<220> 
<221> CDS 
<222> (1) . . (54) 
<223> 

<220> 

<221> misc_f eature 
<222> (12) . . (12) 
<223> y=t,C 

<220> 

<221> misc_f eature 
<222> (15) . . (15) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (18) . . (18) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (19) . . (19) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 
<222> (22) . . (22) 
<223> n=a,t,g,c 
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<220> 

<221> misc_f eature 
<222> (25) . . (25) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (28) . . (28) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 

<222> (32) . . (32) 

<223> k=t,g 

<220> 

<221> misc_f eature 
<222> (34) . . (34) 
<223> v=a,g,c 

<220> 

<221> misc_f eature 
<222> (38) . . (38) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (39) . . (39) 
<223> v=a,g,c 

<220> 

<221> misc_f eature 
<222> (41) . . (41) 
<223> n=a,t,g,c 

<400> 64 

gcg ctg cag gay ggn cgn nac ncc ngg ntg tkc vag gnv cnt tag etc 
Ala Leu Gin Asp Gly Arg Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Leu 
1 5 10 15 



gag eta 
Glu Leu 



<210> 65 
<211> 14 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Single 
tag; forward synthetic peptide 

<220> 

<2 21> mi sc_f eature 
<222> (7) . . (7) 

<223> The -Xaa' at location 7 stands for Asn, Asp, His, or Tyr . 



30 



<220> 

<221> misc_feature 
<222> (8) . . (8) 

<223> The 'Xaa' at location 8 stands for Thr, Ala, Pro, or Ser. 
<220> 

<221> misc_feature 
<222> (9) . . (9) 

<223> The 'Xaa' at location 9 stands for Arg, Gly, or Trp . 
<220> 

<221> misc_f eature 
<222> (10) . . (10) 

<22 3> The 'Xaa' at location 10 stands for Met, Val, or Leu. 
<220> 

<221> misc_feature 
<222> (11) . . (11) 

<22 3> The 'Xaa' at location 11 stands for Cys , or Phe . 
<220> 

<221> misc_feature 
<222> (12) . . (12) 

<2 23> The 'Xaa' at location 12 stands for Lys , Glu, or Gin. 
<220> 

<221> misc_f eature 
<222> (13) . . (13) 

<223> The 'Xaa' at location 13 stands for Glu, Asp, Gly, Ala, or Val. 
<220> 

<221> misc_feature 
<222> (14) . . (14) 

<223> The 'Xaa' at location 14 stands for His, Arg, Pro, or Leu. 
<400> 65 

Ala Leu Gin Asp Gly Arg Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
15 10 



<210> 66 
<211> 54 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Single 
tag; reverse synthetic oligonucleotide 

<220> 

<221> misc_feature 
<222> (14) . . (14) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 
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<222> (16) . . (16) 
<223> b=g,c,t 

<220> 

<221> misc_f eature 
<222> (17) . . (17) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (21) . . (21) 
<223> b=g,c,t 

<220> 

<221> misc_f eature 
<222> (23) . . (23) 
<223> m=a,c, 

<220> 

<221> misc_feature 
<222> (27) . . (27) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 
<222> (30) . . (30) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 

<222> (33) . . (33) 

<223> n=a,t,g,c 

<220> 

<2 21> mi sc_f eature 
<222> (36) . . (36) 
<223> ii=a,t,g,c 

<400> 66 

tagctcgagc taangbncct bgmacanccn ggngtnccgc ccgtcctgca gcgc 



<210> 67 
<211> 87 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Double 
tag; forward synthetic oligonucleotide 

<220> 
<221> CDS 
<222> (1) . . (87) 
<223> 

<220> 

<221> misc_feature 



<222> (12) . . (12) 
<223> y=t,c 

<220> 

<221> misc_f eature 
<222> (15) . . (15) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (18) . . (19) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (22) . . (22) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (25) . . (25) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (28) . . (28) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 
<222> (32) . . (32) 
<223> k=t,g 

<220> 

<221> misc_feature 
<222> (34) . . (34) 
<223> v=a,g,c 

<220> 

<221> misc_f eature 

<222> (38) . . (38) 

<223> n=a,t,g,c 

<220> 

<221> misc_f eature 

<222> (39) . . (39) 

<223> v=a,g,c 

<220> 

<221> misc_f eature 
<222> (41) . . (41) 
<223> n=a,t,g,c 

<220> 

<221> mi sc_f eature 

<222> (45) . . (45) 

<223> y=t,c 
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220> 

221> misc_f eature 

222> (48) . . (48) 

223> n=a,t,g,c 

220> 

221> misc_f eature 
<222> (51) . . (52) 
<223> n=a,t,g,c 



<220> 

<221> misc_f eature 
<222> (55) . . (55) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (58) . . (58) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (61) . . (61) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (65) . . (65) 
<223> k=g,t 

<220> 

<221> misc_feature 
<222> (67) . . (67) 
<223> v=a,g,c 

<220> 

<221> misc_f eature 
<222> (71) . . (71) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 

<222> (72) . . (72) 

<223> v=a,g,c 

<220> 

<221> misc_f eature 
<222> (74) . . (74) 
<223> n=a,t,g,c 



<400> 67 

gcg ctg cag gay ggn cgn nac ncc 
Ala Leu Gin Asp Gly Arg Xaa Xaa 
1 5 

cgn nac ncc ngg ntg tkc vag gnv 
Arg Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
20 



ngg ntg tkc vag gnv cnt gay ggn 
Xaa Xaa Xaa Xaa Xaa Xaa Asp Gly 
10 15 

cnt tag etc gag eta 
Xaa Leu Glu Leu 

25 
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210> 68 
211> 25 
212> PRT 

213> Artificial Sequence 
220> 

<223> Description of Artificial Sequence: Double 
tag; forward synthetic peptide 

<220> 

<221> niisc_f eature 
<222> (7) . . (7) 

<223> The ' Xaa ' at location 7 stands for Asn, Asp, His, or Tyr. 
<220> 

<221> misc_feature 
<222> (8) . . (8) 

<223> The 'Xaa' at location 8 stands for Thr, Ala, Pro, or Ser. 
<220> 

<221> misc_feature 
<222> (9) . . (9) 

<223> The 'Xaa' at location 9 stands for Arg, Gly, or Trp . 
<220> 

<221> misc_feature 
<222> (10) . . (10) 

<223> The 'Xaa' at location 10 stands for Met, Val, or Leu. 
<220> 

<221> misc_feature 
<222> (11) . . (11) 

<223> The 'Xaa' at location 11 stands for Cys, or Phe. 
<220> 

<221> misc_feature 
<222> (12) . . (12) 

<223> The 'Xaa' at location 12 stands for Lys, Glu, or Gin. 
<220> 

<221> misc_f eature 
<222> (13) . . (13) 

<223> The 'Xaa' at location 13 stands for Glu, Asp, Gly, Ala, or Val . 
<220> 

<221> misc_f eature 
<222> (14) . . (14) 

<223> The 'Xaa' at location 14 stands for His, Arg, Pro, or Leu. 
<220> 

<221> misc_f eature 
<222> (18) . . (18) 

<223> The 'Xaa' at location 18 stands for Asn, Asp, His, or Tyr. 
<220> 

<221> misc_f eature 
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222> (19) . . (19) 

223> The 'Xaa' at location 19 stands for Thr, Ala, Pro, or Ser. 
220> 

221> misc_feature 
222> (20) . . (20) 

<223> The 'Xaa' at location 20 stands for Arg, Gly, or Trp . 
<220> 

<2 21> misc_f eature 
<222> (21) . . (21) 

<223> The 'Xaa' at location 21 stands for Met, Val , or Leu. 
<220> 

<221> misc_feature 
<222> (22) . . (22) 

<22 3> The 'Xaa' at location 22 stands for Cys , or Phe . 
<220> 

<221> misc_f eature 
<222> (23) . . (23) 

<223> The 'Xaa' at location 23 stands for Lys , Glu, or Gin. 
<220> 

<221> misc_feature 
<222> (24) . . (24) 

<223> The 'Xaa' at location 24 stands for Glu, Asp, Gly, Ala, or Val. 
<220> 

<221> misc_f eature 
<222> (25) . . (25) 

<223> The -Xaa' at location 25 stands for His, Arg, Pro, or Leu. 
<400> 68 

Ala Leu Gin Asp Gly Arg Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Asp Gly 
1 5 10 15 

Arg Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
20 25 



<210> 69 
<211> 87 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: Double 
tag; reverse synthetic oligonucleotide 

<220> 

<221> misc_feature 
<222> (14) . . (14) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 



<222> (16) . . (16) 
<223> b=t,g,c 

<220> 

<221> misc_f eature 
<222> (17) . . (17) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 

<222> (21) . . (21) 

<223> b=t,g,c 

<220> 

<221> misc_f eature 
<222> (23) . . (23) 
<223> m=a,c 

<220> 

<221> misc_f eature 
<222> (27) . . (27) 
<223> n=a,t,g,c 

<220> 

<221> raisc_f eature 
<222> (30) . . (30) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (33) . . (33) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 
<222> (36) . . (36) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (47) . . (47) 
<223> n=a,t,g,c 



<220> 

<221> misc_f eature 
<222> (49) . . (49) 
<223> b=t,g,c 

<220> 

<221> misc_f eature 
<222> (50) . . (50) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 
<222> (54) . . (54) 
<223> b=t,g,c 
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<220> 

<221> misc_f eature 
<222> (56) . . (56) 
<223> m=a,C 

<220> 

<221> misc_feature 
<222> (60) . . (60) 
<223> n=a,t,g,c 

<220> 

<221> misc_feature 
<222> (63) . . (63) 
<223> n=a,t,g,c 

<220> 

<221> misc_f eature 

<222> (66) . . (66) 

<223> n=a,t,g,c 

<220> 

<221> misc_f eature 
<222> (69) . . (69) 
<223> n=a, t , g, c 

<400> 69 

tagctcgagc taangbncct bgmacanccn ggngtnccgc ccgtcangbn cctbgmacan 
ccnggngtnc cgcccgtcct gcagcgc 
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